AI Agents Crash Course

Learn how to build AI agents from scratch, starting with data ingestion and indexing from GitHub repositories

Introduction to AI Agents

Welcome to the AI Agents Crash Course! In this series, you'll learn how to build intelligent AI agents step by step.

What You'll Learn

This course covers the complete journey of building AI agents:

  • Day 1: Ingest and index data from GitHub repositories
  • Day 2: Chunking and intelligent processing
  • Day 3: Search - Text, Vector, and Hybrid
  • Day 4: Agents and Tools - Building the conversational agent
  • Day 5: Evaluation - Testing and measuring agent quality

Prerequisites

  • Basic Python knowledge
  • Docker installed on your machine
  • Familiarity with command line

Credits

This course is inspired by and based on the AI Hero program by Alexey Grigorev.

Code Repository

Follow along with the complete code at: GitHub Repository

Let's get started!

---PAGE---

Day 1: Ingest and Index Your Data

The first step in building any AI agent is gathering and processing data. Today we'll learn how to:

  • Download repositories as zip archives
  • Parse frontmatter metadata
  • Extract content from markdown files

What is Frontmatter?

Frontmatter is a format used in markdown files where YAML metadata is placed at the top of the file between `---` markers. This is very useful because we can extract structured information (like title, tags, difficulty level) along with the content.

---
title: "Getting Started with AI"
author: "John Doe"
date: "2024-01-15"
tags: ["ai", "machine-learning", "tutorial"]
difficulty: "beginner"
---

# Getting Started with AI

This is the main content of the document written in **Markdown**.

You can include code blocks, links, and other formatting here.

Key Components:

  • YAML Header: Metadata between `---` markers
  • Content: Everything after the closing `---`
  • Structured Data: Title, author, tags, etc.

---PAGE---

Downloading Repository Data

We can download entire GitHub repositories as zip archives using the GitHub codeload API.

import io
import zipfile
import requests

def read_repo_data(repo_owner: str, repo_name: str) -> list[dict]:
    prefix = 'https://codeload.github.com'
    url = f'{prefix}/{repo_owner}/{repo_name}/zip/refs/heads/main'
    resp = requests.get(url)

    if resp.status_code != 200:
        raise Exception(f"Failed to download repository: {resp.status_code}")

    zf = zipfile.ZipFile(io.BytesIO(resp.content))
    # ... process files
    zf.close()

Why This Approach?

  • No need to clone the repository
  • Works with any public repository
  • Processes everything in memory
  • Fast and efficient

---PAGE---

Day 2: Chunking and Intelligent Processing

Now that we can ingest data, we need to break it into manageable pieces. AI models have token limits, and large documents need to be chunked intelligently.

Today we'll learn about:

  • Simple chunking: Fixed-size chunks without overlap
  • Sliding window: Overlapping chunks for context preservation
  • Paragraph-based: Split by natural paragraph breaks
  • Section-based: Split by markdown headers
  • LLM-based: AI-powered semantic chunking using OpenRouter

Why Chunking Matters

The key insight: start simple, evaluate, then iterate. Most use cases don't require sophisticated chunking methods.

---PAGE---

Real Benchmark Results

We tested all chunking methods on a sample markdown document (~1,900 characters). Here are the actual results:

Benchmark Comparison Table

MethodChunksAvg SizeMinMaxHas Topics
Simple4472 chars387500No
Sliding7484 chars387500No
Paragraph17109 chars6369No
Section3538 chars239956Header only
LLM8234 chars124369Yes

Key Insights from Benchmark

Simple Chunking: Created 4 chunks with 3 full-size (500 chars) and 1 partial. Predictable but cuts mid-sentence.

Sliding Window: Created 7 chunks with 50% overlap. More chunks but better context preservation at boundaries.

Paragraph: Created 17 small chunks (avg 109 chars). Great for granular retrieval but many tiny chunks.

Section: Only 3 chunks following markdown headers. Best for well-structured documents with clear sections.

LLM (Gemini 2.0 Flash): Created 8 semantically meaningful chunks with topics like "Introduction", "FAQ Proposal Form - Steps", "Writing Good FAQ Guidelines". Each chunk is a complete thought.

---PAGE---

LLM Chunking Output Example

Chunk 1: Topic: "Introduction"
  Content: # Contributing to DataTalksClub FAQ...

Chunk 2: Topic: "FAQ Proposal Form - Introduction"
  Content: ## FAQ Proposal Form - We have an automated system...

Chunk 3: Topic: "FAQ Proposal Form - Steps"
  Content: 1. Go to the FAQ Proposal form 2. Fill out the form...

Chunk 4: Topic: "FAQ Bot Actions"
  Content: After that, our FAQ bot will: 1. Analyze your proposal...

Chunk 5: Topic: "FAQ Bot Outcomes"
  Content: If it's NEW or UPDATE, the bot will create a PR...

Chunk 6: Topic: "Writing Good FAQ Guidelines - Question"
  Content: ## Writing Good FAQ Guidelines - Question: Be specific...

Chunk 7: Topic: "Writing Good FAQ Guidelines - Answer"
  Content: Answer - Start with a direct answer, Include code examples...

Chunk 8: Topic: "Questions or Issues"
  Content: ## Questions or Issues? If you have questions...

---PAGE---

Running Day 2

Build and Run

docker-compose build day2
docker run --rm ai_agent_crashcoourse-day2

Demo All Rule-Based Methods

docker run --rm ai_agent_crashcoourse-day2 python main.py --demo

LLM-Based Chunking (OpenRouter)

# Demo LLM chunking
docker run --rm -e OPENROUTER_API_KEY=your_key \
  ai_agent_crashcoourse-day2 python main.py --demo-llm

Default model: `google/gemini-2.0-flash-001` (fast and cheap)

---PAGE---

Day 2 Key Takeaways

Chunking Methods Comparison

MethodBest ForProsCons
SimpleUniform processingPredictable sizesMay cut mid-sentence
SlidingGeneral purposeContext preservationMore chunks (overlap)
ParagraphNatural textSemantic unitsVariable sizes
SectionStructured docsLogical groupingNeeds markdown headers
LLMHigh-quality RAGSemantic understanding, topicsAPI cost, slower

When to Use Each Method

  1. Start with Sliding Window: Good default for most use cases
  2. Use Section for well-structured markdown documentation
  3. Use Paragraph when you need granular retrieval
  4. Use LLM when chunk quality matters more than speed/cost

---PAGE---

Day 3: Search - Text, Vector, and Hybrid

Now that we have ingested and chunked our data, we need to make it searchable. Today we'll learn about three search approaches:

  • Text Search (Lexical): Fast keyword matching
  • Vector Search (Semantic): Find similar content using embeddings
  • Hybrid Search: Combine both for best results

Key Insight

Always start with the simplest approach. For search, that's text search. It's fast, efficient, works well for exact matches, and requires no model inference. Add complexity (vector search) only when basic approaches prove insufficient.

---PAGE---

Text Search with Minsearch

Text search identifies documents containing query words. We use the minsearch library for fast lexical matching.

from minsearch import Index

# Create index with searchable fields
index = Index(
    text_fields=["chunk", "title", "description", "filename"],
    keyword_fields=[]
)

# Fit the index with our chunks
index.fit(chunks)

# Search!
results = index.search("how to detect data drift", num_results=5)

When to Use Text Search:

  • Exact keyword matching
  • Specific technical terms
  • Fast queries without ML overhead
  • When you know the exact terminology

---PAGE---

Vector Search with Embeddings

Vector search uses embeddings to find semantically similar content, even when different words are used.

from sentence_transformers import SentenceTransformer
import numpy as np

# Load embedding model
model = SentenceTransformer('multi-qa-distilbert-cos-v1')

# Create embeddings for all chunks
embeddings = []
for chunk in chunks:
    v = model.encode(chunk['text'])
    embeddings.append(v)

embeddings = np.array(embeddings)

Searching with Vectors

def vector_search(query, model, embeddings, chunks, num_results=5):
    # Encode query
    query_embedding = model.encode(query)

    # Compute cosine similarity
    similarities = np.dot(embeddings, query_embedding)

    # Get top results
    top_indices = np.argsort(similarities)[::-1][:num_results]
    return [chunks[i] for i in top_indices]

When to Use Vector Search:

  • Semantic similarity (find related concepts)
  • When users don't know exact terminology
  • Natural language queries
  • Finding conceptually similar content

---PAGE---

Hybrid search combines both approaches by running text and vector searches, then deduplicating results.

def hybrid_search(query, text_index, model, embeddings, chunks):
    # Get results from both methods
    text_results = text_search(text_index, query)
    vector_results = vector_search(query, model, embeddings, chunks)

    # Deduplicate
    seen_ids = set()
    combined = []

    for result in text_results + vector_results:
        chunk_id = result['filename'] + result['chunk'][:50]
        if chunk_id not in seen_ids:
            seen_ids.add(chunk_id)
            combined.append(result)

    return combined

Benefits of Hybrid Search:

  • Best of both worlds
  • Catches exact matches AND semantic matches
  • More robust retrieval

---PAGE---

Running Day 3

Build and Run

docker-compose build day3
docker run --rm ai_agent_crashcoourse-day3

Text Search Demo (Lightweight)

docker run --rm ai_agent_crashcoourse-day3 python main.py --demo-text

Full Demo (Downloads ~100MB embedding model)

docker run --rm ai_agent_crashcoourse-day3 python main.py --demo

---PAGE---

Day 3 Key Takeaways

Search Methods Comparison

MethodSpeedAccuracyUse Case
TextFastExact matchesKeywords, technical terms
VectorSlowerSemanticNatural language, concepts
HybridMediumComprehensiveProduction RAG systems

Practical Recommendations

  1. Start with Text Search: It's fast, simple, and often sufficient
  2. Add Vector Search when users need semantic matching
  3. Use Hybrid for production RAG applications
  4. Model choice matters: multi-qa-distilbert-cos-v1 is a good balance of speed and quality

Next Steps

In Day 4, we'll learn how to:

  • Connect to an LLM for answer generation
  • Build a complete RAG pipeline
  • Handle context and prompts effectively

---PAGE---

Day 4: Agents and Tools

An agent is fundamentally an LLM with the ability to invoke tools—external functions that enable information retrieval, calculations, or actions. Tools are what distinguish agents from basic chatbots.

Course Progress

  • Day 1: Data download from GitHub
  • Day 2: Data processing via chunking
  • Day 3: Data indexing for searchability
  • Day 4: Agent creation with tool access

Key Insight: Data preparation consumes most development time and represents the most critical component.

---PAGE---

Function Calling with OpenRouter

We use OpenRouter to access various LLMs. Developers must describe functions in a structured JSON format so LLMs understand how to invoke them:

import requests
import json

# Define the tool
text_search_tool = {
    "type": "function",
    "function": {
        "name": "text_search",
        "description": "Search the FAQ database for relevant information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query text"
                }
            },
            "required": ["query"]
        }
    }
}

# OpenRouter API call with tools
def call_agent(user_message, tools):
    response = requests.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {OPENROUTER_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "model": "google/gemini-2.0-flash-001",
            "messages": [
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_message}
            ],
            "tools": tools
        }
    )
    return response.json()

The agent analyzes user input, determines necessary tool calls, executes them, and generates context-aware responses.

---PAGE---

System Prompts Matter

Instructions significantly influence agent behavior. Enhanced prompts encourage strategic tool usage:

system_prompt = """You are a helpful FAQ assistant.

IMPORTANT RULES:
1. Always search for relevant information before answering
2. Make multiple searches if needed for comprehensive answers
3. If the search returns no results, say you don't have that information
4. Base your answers ONLY on search results, not training data
5. Be concise but thorough

When users ask questions, use the text_search tool to find relevant FAQ entries."""

Why System Prompts Are Critical

Without Good PromptWith Good Prompt
Agent answers from training dataAgent searches first, then answers
May hallucinate answersGrounded in your actual data
Inconsistent tool usageReliably uses tools

---PAGE---

Handling Tool Calls

When the LLM decides to use a tool, you need to execute it and send results back:

def run_agent_loop(user_question):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_question}
    ]

    while True:
        response = call_openrouter(messages, tools=[text_search_tool])
        message = response["choices"][0]["message"]

        # Check if agent wants to call a tool
        if message.get("tool_calls"):
            for tool_call in message["tool_calls"]:
                func_name = tool_call["function"]["name"]
                func_args = json.loads(tool_call["function"]["arguments"])

                # Execute the tool
                if func_name == "text_search":
                    result = text_search(func_args["query"])

                # Add tool result to messages
                messages.append(message)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call["id"],
                    "content": json.dumps(result)
                })
        else:
            # No more tool calls, return final answer
            return message["content"]

---PAGE---

Complete Agent Example

Here's a full working agent that searches your FAQ database:

import os
import json
import requests
from minsearch import Index

OPENROUTER_API_KEY = os.environ.get("OPENROUTER_API_KEY")

# Your search index (from Day 3)
index = Index(text_fields=["chunk", "title"], keyword_fields=[])
index.fit(chunks)

def text_search(query: str) -> list:
    """Search the FAQ database."""
    results = index.search(query, num_results=3)
    return [{"text": r["chunk"], "title": r["title"]} for r in results]

def run_faq_agent(question: str) -> str:
    """Run the FAQ agent with tool support."""

    tools = [{
        "type": "function",
        "function": {
            "name": "text_search",
            "description": "Search FAQ database",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }]

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question}
    ]

    # Agent loop
    for _ in range(5):  # Max 5 iterations
        response = requests.post(
            "https://openrouter.ai/api/v1/chat/completions",
            headers={"Authorization": f"Bearer {OPENROUTER_API_KEY}"},
            json={"model": "google/gemini-2.0-flash-001",
                  "messages": messages, "tools": tools}
        ).json()

        msg = response["choices"][0]["message"]

        if not msg.get("tool_calls"):
            return msg["content"]

        messages.append(msg)
        for tc in msg["tool_calls"]:
            args = json.loads(tc["function"]["arguments"])
            result = text_search(args["query"])
            messages.append({
                "role": "tool",
                "tool_call_id": tc["id"],
                "content": json.dumps(result)
            })

    return "Max iterations reached"

---PAGE---

Running Day 4

Build and Run

docker-compose build day4
docker run --rm -e OPENROUTER_API_KEY=your_key ai_agent_crashcoourse-day4

Interactive Mode

docker run --rm -it -e OPENROUTER_API_KEY=your_key \\
  ai_agent_crashcoourse-day4 python main.py --interactive

---PAGE---

Day 4 Key Takeaways

What Makes an Agent

ChatbotAgent
Responds from training dataUses tools to access external data
Static knowledgeDynamic knowledge retrieval
No actionsCan perform actions
Simple Q&AComplex multi-step tasks

Key Points

  1. Tools define "agentic" capabilities - the distinction between responding from training data vs accessing domain-specific knowledge dynamically
  2. System prompts are critical - they guide when and how the agent uses tools
  3. The agent loop - call LLM → check for tool calls → execute tools → repeat until done
  4. OpenRouter provides access to many models with a unified API

---PAGE---

Day 5: Evaluation

Is our agent actually good? Today we learn how to answer this question systematically.

Why Evaluation Matters

Evaluation is critical for building reliable AI systems. Without proper evaluation:

  • You can't tell if changes improve or hurt performance
  • You can't compare different approaches
  • You can't build confidence before deploying to users

Key Concepts

  • Vibe Check: Manual testing - interact with the agent and see if responses make sense
  • Logging: Record all interactions for later analysis
  • LLM as a Judge: Use AI to evaluate AI responses automatically
  • Test Data Generation: Create test questions using AI

---PAGE---

Logging Agent Interactions

The first step is capturing data. We log every interaction:

import json
import secrets
from pathlib import Path
from datetime import datetime

LOG_DIR = Path('logs')
LOG_DIR.mkdir(exist_ok=True)

def save_log(agent, messages, source="user"):
    """Save interaction to JSON file."""
    entry = {
        "agent_name": agent.name,
        "system_prompt": agent.system_prompt,
        "model": agent.model,
        "messages": messages,
        "source": source,
        "timestamp": datetime.now().isoformat()
    }

    ts_str = datetime.now().strftime("%Y%m%d_%H%M%S")
    rand_hex = secrets.token_hex(3)
    filename = f"{agent.name}_{ts_str}_{rand_hex}.json"

    with open(LOG_DIR / filename, "w") as f:
        json.dump(entry, f, indent=2)

What to log:

  • System prompt used
  • Model name
  • User query
  • Tool calls and results
  • Final response
  • Source (user vs AI-generated)

---PAGE---

LLM as a Judge

Instead of manually evaluating hundreds of responses, we use AI to evaluate AI:

EVALUATION_PROMPT = """
Evaluate the AI agent's answer using this checklist:

- instructions_follow: Agent followed instructions
- answer_relevant: Response addresses the question
- answer_clear: Answer is clear and understandable
- answer_citations: Proper citations included
- completeness: Response covers key aspects
- tool_usage: Search tool used appropriately

For each check, output true/false with a brief explanation.
"""

def evaluate_interaction(question, answer, log):
    """Use LLM to evaluate agent response."""
    prompt = f"""
    <QUESTION>{question}</QUESTION>
    <ANSWER>{answer}</ANSWER>
    <LOG>{json.dumps(log)}</LOG>
    """

    # Call evaluation LLM
    result = call_llm(EVALUATION_PROMPT, prompt)
    return result

Pro tip: Use a different model for evaluation than the one being evaluated. This reduces self-bias.

---PAGE---

Test Data Generation

We can use AI to generate realistic test questions:

QUESTION_GEN_PROMPT = """
Based on the FAQ content, generate realistic questions students might ask.

Questions should:
- Be natural and varied in style
- Range from simple to complex
- Include technical and general questions

Generate {num_questions} questions.
"""

def generate_test_questions(chunks, num_questions=10):
    """Generate test questions from FAQ content."""
    sample = random.sample(chunks, min(10, len(chunks)))
    content = json.dumps([c['chunk'] for c in sample])

    result = call_llm(QUESTION_GEN_PROMPT, content)
    return result['questions']

Benefits of AI-generated test data:

  • Faster than manual creation
  • Can cover edge cases
  • Scales easily

Limitations:

  • May not reflect real user behavior
  • Could miss important edge cases only real users find

---PAGE---

Calculating Metrics

After evaluation, calculate pass rates for each check:

import pandas as pd

def calculate_metrics(eval_results):
    """Calculate pass rates from evaluation results."""
    rows = []

    for result in eval_results:
        row = {"question": result["question"]}
        for check in result["checks"]:
            row[check["name"]] = check["pass"]
        rows.append(row)

    df = pd.DataFrame(rows)
    return df

# Example output:
# instructions_follow    0.30  (30% pass rate)
# answer_relevant        1.00  (100% pass rate)
# answer_citations       0.30  (30% pass rate)
# tool_usage             1.00  (100% pass rate)

Key insight: The most important check is answer_relevant. If this score is low, your agent isn't ready.

---PAGE---

Running Day 5

Build and Run

docker-compose build day5
docker run --rm -e OPENROUTER_API_KEY=your_key ai_agent_crashcoourse-day5

Demo Individual Components

# Logging only
docker run --rm -e OPENROUTER_API_KEY=your_key \\
  ai_agent_crashcoourse-day5 python main.py --demo-logging

# Evaluate existing logs
docker run --rm -e OPENROUTER_API_KEY=your_key \\
  ai_agent_crashcoourse-day5 python main.py --demo-eval

# Generate test questions
docker run --rm -e OPENROUTER_API_KEY=your_key \\
  ai_agent_crashcoourse-day5 python main.py --demo-generate

---PAGE---

Day 5 Key Takeaways

Evaluation Process

StepWhatWhy
1. LogRecord all interactionsBuild evaluation dataset
2. GenerateCreate test questions with AIScale test coverage
3. EvaluateLLM as a JudgeAutomate quality checks
4. MeasureCalculate metricsTrack improvements

What You Can Do With Evaluation

  1. Decide if quality is good enough for deployment
  2. Compare different approaches (chunking, search, prompts)
  3. Track improvements over time
  4. Identify edge cases that need attention

Course Summary

You've now built a complete AI agent system:

  • ✅ Ingest data from GitHub (Day 1)
  • ✅ Chunk documents intelligently (Day 2)
  • ✅ Search with text, vector, and hybrid methods (Day 3)
  • ✅ Answer questions using tools and LLMs (Day 4)
  • ✅ Evaluate and measure agent quality (Day 5)

Congratulations!

AI Agents Crash Course | Software Engineer Blog