RAG vs Knowledge Graphs — How to Give LLMs the Right Context

In the previous posts, we built agents that call tools, verify their own reasoning, and do deep research. All of those agents share the same fundamental problem: the LLM doesn't know your data.

You can build the most sophisticated agent architecture in the world — but if it hallucinates facts about your company's products, your medical records, or your codebase, it's useless.

This post is about the two main approaches to solving that problem: Retrieval-Augmented Generation (RAG) and Knowledge Graphs (KG) — what they are, how they work, when to use each, and how to combine them.

The Core Problem: LLMs Don't Know Your Data

LLMs are trained on public internet data up to a cutoff date. They don't know:

Your internal documentation
Your product catalog
What happened yesterday
The relationships between entities in your domain

You have two choices: fine-tune the model on your data (expensive, slow, doesn't handle frequent updates), or inject relevant context at query time. The second approach is what RAG and Knowledge Graphs are all about.

What is RAG?

Retrieval-Augmented Generation is a simple idea: before the LLM generates a response, retrieve relevant documents from your data and include them in the prompt.

How RAG Works

User Query → Embed Query → Search Vector DB → Retrieve Top-K Chunks → Inject into Prompt → LLM Generates Answer

Step by step:

Indexing phase (offline): Split your documents into chunks, generate embeddings for each chunk, store them in a vector database.
Query phase (online): Embed the user's question, find the most similar chunks via vector search, stuff them into the LLM's context window.

RAG in Code

Here's a minimal RAG pipeline using ChromaDB and OpenAI:

import chromadb
from openai import OpenAI

client = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("docs")

# --- Indexing Phase ---
def index_documents(documents: list[dict]):
    """Index documents into the vector store."""
    for doc in documents:
        chunks = chunk_text(doc["content"], chunk_size=500, overlap=50)
        for i, chunk in enumerate(chunks):
            collection.add(
                ids=[f"{doc['id']}_chunk_{i}"],
                documents=[chunk],
                metadatas=[{"source": doc["source"], "doc_id": doc["id"]}]
            )

def chunk_text(text: str, chunk_size: int, overlap: int) -> list[str]:
    """Split text into overlapping chunks."""
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

# --- Query Phase ---
def rag_query(question: str, n_results: int = 5) -> str:
    """Answer a question using RAG."""
    # 1. Retrieve relevant chunks
    results = collection.query(
        query_texts=[question],
        n_results=n_results
    )

    # 2. Build context from retrieved chunks
    context_chunks = results["documents"][0]
    context = "\n\n---\n\n".join(context_chunks)

    # 3. Generate answer with context
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Answer the user's question based on the provided context. "
                    "If the context doesn't contain the answer, say so. "
                    "Cite which parts of the context you used."
                )
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )
    return response.choices[0].message.content

This works. For many use cases, this is all you need. But it has real limitations.

Where RAG Struggles

1. No understanding of relationships. If you ask "Which teams report to the VP of Engineering?", RAG will search for chunks that mention "VP of Engineering." It might find a chunk about the VP and a chunk about Team A, but it can't connect them unless they appear in the same chunk.

2. Chunk boundary problems. Important information often spans multiple chunks. A paragraph about a product's pricing might get split across two chunks, and neither chunk alone answers the question.

3. Retrieval quality is fragile. The entire system depends on the embedding model finding the right chunks. If the user's question uses different terminology than the source document, retrieval fails silently — the LLM gets irrelevant context and produces a confident but wrong answer.

4. No reasoning over structure. RAG treats all documents as flat text. It can't answer "What's the shortest path from A to B?" or "Which products share components with Product X?" because those require traversing relationships.

What is a Knowledge Graph?

A Knowledge Graph stores information as entities (nodes) and relationships (edges), forming a structured, queryable network of facts.

[Person: Alice] --works_at--> [Company: Acme Corp]
[Person: Alice] --reports_to--> [Person: Bob]
[Person: Bob] --manages--> [Team: Platform]
[Team: Platform] --owns--> [Service: Auth API]

Instead of searching for similar text, you query the graph to traverse relationships and extract structured answers.

Knowledge Graph in Code

Here's a Knowledge Graph implementation using Neo4j:

from neo4j import GraphDatabase

class KnowledgeGraph:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def add_entity(self, entity_type: str, name: str, properties: dict = None):
        """Add a node to the graph."""
        props = properties or {}
        props["name"] = name
        prop_string = ", ".join(f"{k}: ${k}" for k in props)
        query = f"MERGE (n:{entity_type} {{{prop_string}}}) RETURN n"

        with self.driver.session() as session:
            session.run(query, **props)

    def add_relationship(self, from_name: str, rel_type: str, to_name: str,
                         properties: dict = None):
        """Add an edge between two nodes."""
        props = properties or {}
        prop_string = ""
        if props:
            prop_string = " {" + ", ".join(f"{k}: ${k}" for k in props) + "}"

        query = (
            f"MATCH (a {{name: $from_name}}), (b {{name: $to_name}}) "
            f"MERGE (a)-[r:{rel_type}{prop_string}]->(b) RETURN r"
        )
        with self.driver.session() as session:
            session.run(query, from_name=from_name, to_name=to_name, **props)

    def query(self, cypher: str, params: dict = None) -> list[dict]:
        """Run a Cypher query and return results."""
        with self.driver.session() as session:
            result = session.run(cypher, **(params or {}))
            return [record.data() for record in result]

Populating the Graph

kg = KnowledgeGraph("bolt://localhost:7687", "neo4j", "password")

# Add entities
kg.add_entity("Person", "Alice", {"role": "Senior Engineer"})
kg.add_entity("Person", "Bob", {"role": "VP of Engineering"})
kg.add_entity("Team", "Platform", {"focus": "Infrastructure"})
kg.add_entity("Service", "Auth API", {"language": "Python"})

# Add relationships
kg.add_relationship("Alice", "REPORTS_TO", "Bob")
kg.add_relationship("Bob", "MANAGES", "Platform")
kg.add_relationship("Platform", "OWNS", "Auth API")
kg.add_relationship("Alice", "CONTRIBUTES_TO", "Auth API")

Querying the Graph with an LLM

The key idea: use the LLM to convert natural language questions into graph queries.

def kg_query_with_llm(question: str, kg: KnowledgeGraph) -> str:
    """Convert a natural language question to a Cypher query and execute it."""
    # Get the graph schema to help the LLM
    schema = kg.query(
        "CALL db.schema.visualization() YIELD nodes, relationships RETURN *"
    )

    # Ask the LLM to generate a Cypher query
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a Neo4j Cypher expert. Convert the user's natural "
                    "language question into a Cypher query. Return ONLY the "
                    "Cypher query, no explanation.\n\n"
                    f"Graph schema: {schema}"
                )
            },
            {"role": "user", "content": question}
        ]
    )

    cypher = response.choices[0].message.content.strip()
    cypher = cypher.replace("```cypher", "").replace("```", "").strip()

    # Execute the query
    results = kg.query(cypher)

    # Generate a natural language answer
    answer_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Answer the question based on the query results."
            },
            {
                "role": "user",
                "content": (
                    f"Question: {question}\n"
                    f"Query: {cypher}\n"
                    f"Results: {results}"
                )
            }
        ]
    )
    return answer_response.choices[0].message.content

Now you can ask:

answer = kg_query_with_llm("Which services does Alice contribute to?", kg)
# → "Alice contributes to the Auth API service."

answer = kg_query_with_llm("Who manages the team that owns Auth API?", kg)
# → "Bob manages the Platform team, which owns the Auth API."

RAG would struggle with the second question because it requires traversing two relationships: Auth API → Platform → Bob. A knowledge graph handles it naturally.

Where Knowledge Graphs Struggle

1. Schema design is hard. You need to decide upfront what entities and relationships matter. Real-world data is messy and doesn't always fit clean schemas.

2. Population is expensive. Extracting entities and relationships from unstructured text (documents, emails, reports) requires NLP pipelines or manual curation.

3. Poor at free-text answers. Knowledge graphs are great at structured queries but terrible at "Summarize the key points of this document" — they don't store the original text.

4. Cypher generation can fail. The LLM might generate invalid or inefficient queries, especially for complex schemas.

Side-by-Side Comparison

Dimension	RAG	Knowledge Graph
Data format	Unstructured text (docs, PDFs, web pages)	Structured entities and relationships
Query type	"What does the doc say about X?"	"How is X related to Y?"
Setup effort	Low — chunk, embed, store	High — design schema, extract entities
Handles updates	Re-embed changed docs	Update nodes/edges
Multi-hop reasoning	Poor — limited to what's in retrieved chunks	Excellent — traverse relationships
Free-text answers	Excellent — has the source text	Poor — only has structured facts
Scalability	Scales well with vector DBs	Can get complex with large graphs
Hallucination risk	Medium — depends on retrieval quality	Low — answers come from explicit facts

When to Use RAG

Your data is mostly unstructured text (documentation, articles, support tickets)
Users ask "what does it say about..." questions
You need to get something working quickly
Your data changes frequently and you need to re-index fast

When to Use a Knowledge Graph

Your data has rich relationships (org charts, supply chains, product catalogs)
Users ask "how is X related to Y" or "what depends on X" questions
You need precise, auditable answers (compliance, medical, legal)
You can invest in schema design and entity extraction

The Best of Both Worlds: GraphRAG

What if you could combine the free-text understanding of RAG with the relational reasoning of Knowledge Graphs? That's GraphRAG.

The idea: build a knowledge graph from your documents, then use both vector search and graph traversal to retrieve context for the LLM.

GraphRAG Architecture

User Query
    ├── Vector Search → Relevant text chunks
    └── Graph Query → Related entities and relationships
            ↓
    Merged Context
            ↓
    LLM generates answer

Building a GraphRAG Pipeline

class GraphRAG:
    """Combines vector search (RAG) with knowledge graph traversal."""

    def __init__(self, collection, kg: KnowledgeGraph):
        self.collection = collection  # ChromaDB collection
        self.kg = kg

    def extract_entities(self, text: str) -> list[dict]:
        """Use an LLM to extract entities and relationships from text."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Extract entities and relationships from the text. "
                        "Return JSON with format: "
                        '{"entities": [{"name": "...", "type": "..."}], '
                        '"relationships": [{"from": "...", "type": "...", '
                        '"to": "..."}]}'
                    )
                },
                {"role": "user", "content": text}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content)

    def index_document(self, doc_id: str, content: str, source: str):
        """Index a document in both vector store and knowledge graph."""
        # 1. Vector store: chunk and embed
        chunks = chunk_text(content, chunk_size=500, overlap=50)
        for i, chunk in enumerate(chunks):
            self.collection.add(
                ids=[f"{doc_id}_chunk_{i}"],
                documents=[chunk],
                metadatas=[{"source": source, "doc_id": doc_id}]
            )

        # 2. Knowledge graph: extract and store entities
        extracted = self.extract_entities(content)
        for entity in extracted["entities"]:
            self.kg.add_entity(entity["type"], entity["name"])
        for rel in extracted["relationships"]:
            self.kg.add_relationship(rel["from"], rel["type"], rel["to"])

    def query(self, question: str) -> str:
        """Answer a question using both vector search and graph traversal."""
        # 1. Vector search for relevant text chunks
        vector_results = self.collection.query(
            query_texts=[question], n_results=5
        )
        text_context = "\n\n".join(vector_results["documents"][0])

        # 2. Extract entities from the question, then query the graph
        question_entities = self.extract_entities(question)
        graph_context = ""
        for entity in question_entities.get("entities", []):
            neighbors = self.kg.query(
                "MATCH (n {name: $name})-[r]-(m) "
                "RETURN n.name, type(r) AS rel, m.name LIMIT 20",
                {"name": entity["name"]}
            )
            if neighbors:
                graph_context += f"\nRelationships for {entity['name']}:\n"
                for n in neighbors:
                    graph_context += (
                        f"  {n['n.name']} --{n['rel']}--> {n['m.name']}\n"
                    )

        # 3. Combine both contexts and generate answer
        combined_context = (
            f"=== Text Context (from documents) ===\n{text_context}\n\n"
            f"=== Graph Context (relationships) ===\n{graph_context}"
        )

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Answer the question using both the text context and "
                        "graph context provided. The text context gives you "
                        "detailed information from documents. The graph context "
                        "shows entity relationships. Use both to give a "
                        "complete answer."
                    )
                },
                {
                    "role": "user",
                    "content": (
                        f"Context:\n{combined_context}\n\nQuestion: {question}"
                    )
                }
            ]
        )
        return response.choices[0].message.content

Using GraphRAG

graph_rag = GraphRAG(collection, kg)

# Index your documents (this populates both vector store and graph)
graph_rag.index_document(
    "doc_1",
    "Alice is a senior engineer at Acme Corp. She works on the Platform "
    "team led by Bob, the VP of Engineering. The Platform team owns the "
    "Auth API, which handles authentication for all Acme products.",
    source="company_wiki"
)

# Query — gets both text details and relationship traversal
answer = graph_rag.query(
    "What is Alice's relationship to the Auth API, and what does it do?"
)

This query benefits from both approaches:

Vector search finds the chunk describing what Auth API does ("handles authentication for all Acme products")
Graph traversal finds the chain: Alice → Platform team → Auth API

Neither approach alone gives the complete picture.

Practical Tips

Start with RAG

Unless your data is inherently graph-structured (org charts, dependency trees, supply chains), start with plain RAG. It's simpler to build, easier to debug, and works well for most use cases.

Improve RAG Before Adding a Graph

Before reaching for a knowledge graph, try these RAG improvements:

Better chunking — Use semantic chunking (split on paragraph/section boundaries) instead of fixed character counts
Hybrid search — Combine vector search with keyword search (BM25) for better retrieval
Re-ranking — Use a cross-encoder model to re-rank retrieved chunks before passing to the LLM
Query expansion — Use the LLM to rewrite the user's question into multiple search queries
Metadata filtering — Filter chunks by source, date, or category before vector search

# Example: Hybrid search with metadata filtering
results = collection.query(
    query_texts=[question],
    n_results=10,
    where={"source": "product_docs"},  # Filter by metadata
)

Add a Knowledge Graph When You Hit RAG's Limits

You'll know it's time when:

Users ask multi-hop questions that RAG can't answer
You need to traverse relationships ("What depends on X?")
You need precise, auditable answers grounded in explicit facts
Your domain has clear entities and relationships

Use LLMs for Entity Extraction

Building a knowledge graph manually doesn't scale. Use LLMs to extract entities and relationships from your documents automatically. The extract_entities method in the GraphRAG example above shows the pattern — but for production, add validation and deduplication:

def deduplicate_entities(entities: list[dict]) -> list[dict]:
    """Merge duplicate entities (e.g., 'Auth API' and 'Authentication API')."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Given these entities, identify duplicates (same entity "
                    "with different names) and return a deduplicated list. "
                    "Return JSON."
                )
            },
            {
                "role": "user",
                "content": json.dumps(entities)
            }
        ],
        response_format={"type": "json_object"}
    )
    return json.loads(response.choices[0].message.content)

Summary

Approach	Best For	Start Here If...
RAG	Text-heavy data, "what does it say" questions	You have documents and need answers fast
Knowledge Graph	Relationship-heavy data, "how is X related to Y" questions	Your data has clear entities and connections
GraphRAG	Complex domains that need both text and relationships	RAG alone can't answer multi-hop questions

The progression is natural: start with RAG, improve it, and add a knowledge graph when your use case demands relational reasoning. Don't over-engineer from the start — let the questions your users actually ask guide the architecture.

This post is part of the AI Agents series. Previous posts covered tool calling, ReAct agents, and deep search.