RAG vs Knowledge Graphs — How to Give LLMs the Right Context
A practical comparison of Retrieval-Augmented Generation and Knowledge Graphs for grounding LLM responses, with architecture patterns, code examples, and guidance on when to use each approach.
In the previous posts, we built agents that call tools, verify their own reasoning, and do deep research. All of those agents share the same fundamental problem: the LLM doesn't know your data.
You can build the most sophisticated agent architecture in the world — but if it hallucinates facts about your company's products, your medical records, or your codebase, it's useless.
This post is about the two main approaches to solving that problem: Retrieval-Augmented Generation (RAG) and Knowledge Graphs (KG) — what they are, how they work, when to use each, and how to combine them.
The Core Problem: LLMs Don't Know Your Data
LLMs are trained on public internet data up to a cutoff date. They don't know:
- Your internal documentation
- Your product catalog
- What happened yesterday
- The relationships between entities in your domain
You have two choices: fine-tune the model on your data (expensive, slow, doesn't handle frequent updates), or inject relevant context at query time. The second approach is what RAG and Knowledge Graphs are all about.
What is RAG?
Retrieval-Augmented Generation is a simple idea: before the LLM generates a response, retrieve relevant documents from your data and include them in the prompt.
How RAG Works
User Query → Embed Query → Search Vector DB → Retrieve Top-K Chunks → Inject into Prompt → LLM Generates Answer
Step by step:
- Indexing phase (offline): Split your documents into chunks, generate embeddings for each chunk, store them in a vector database.
- Query phase (online): Embed the user's question, find the most similar chunks via vector search, stuff them into the LLM's context window.
RAG in Code
Here's a minimal RAG pipeline using ChromaDB and OpenAI:
import chromadb
from openai import OpenAI
client = OpenAI()
chroma = chromadb.PersistentClient(path="./chroma_db")
collection = chroma.get_or_create_collection("docs")
# --- Indexing Phase ---
def index_documents(documents: list[dict]):
"""Index documents into the vector store."""
for doc in documents:
chunks = chunk_text(doc["content"], chunk_size=500, overlap=50)
for i, chunk in enumerate(chunks):
collection.add(
ids=[f"{doc['id']}_chunk_{i}"],
documents=[chunk],
metadatas=[{"source": doc["source"], "doc_id": doc["id"]}]
)
def chunk_text(text: str, chunk_size: int, overlap: int) -> list[str]:
"""Split text into overlapping chunks."""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start = end - overlap
return chunks
# --- Query Phase ---
def rag_query(question: str, n_results: int = 5) -> str:
"""Answer a question using RAG."""
# 1. Retrieve relevant chunks
results = collection.query(
query_texts=[question],
n_results=n_results
)
# 2. Build context from retrieved chunks
context_chunks = results["documents"][0]
context = "\n\n---\n\n".join(context_chunks)
# 3. Generate answer with context
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Answer the user's question based on the provided context. "
"If the context doesn't contain the answer, say so. "
"Cite which parts of the context you used."
)
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return response.choices[0].message.content
This works. For many use cases, this is all you need. But it has real limitations.
Where RAG Struggles
1. No understanding of relationships. If you ask "Which teams report to the VP of Engineering?", RAG will search for chunks that mention "VP of Engineering." It might find a chunk about the VP and a chunk about Team A, but it can't connect them unless they appear in the same chunk.
2. Chunk boundary problems. Important information often spans multiple chunks. A paragraph about a product's pricing might get split across two chunks, and neither chunk alone answers the question.
3. Retrieval quality is fragile. The entire system depends on the embedding model finding the right chunks. If the user's question uses different terminology than the source document, retrieval fails silently — the LLM gets irrelevant context and produces a confident but wrong answer.
4. No reasoning over structure. RAG treats all documents as flat text. It can't answer "What's the shortest path from A to B?" or "Which products share components with Product X?" because those require traversing relationships.
What is a Knowledge Graph?
A Knowledge Graph stores information as entities (nodes) and relationships (edges), forming a structured, queryable network of facts.
[Person: Alice] --works_at--> [Company: Acme Corp]
[Person: Alice] --reports_to--> [Person: Bob]
[Person: Bob] --manages--> [Team: Platform]
[Team: Platform] --owns--> [Service: Auth API]
Instead of searching for similar text, you query the graph to traverse relationships and extract structured answers.
Knowledge Graph in Code
Here's a Knowledge Graph implementation using Neo4j:
from neo4j import GraphDatabase
class KnowledgeGraph:
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def add_entity(self, entity_type: str, name: str, properties: dict = None):
"""Add a node to the graph."""
props = properties or {}
props["name"] = name
prop_string = ", ".join(f"{k}: ${k}" for k in props)
query = f"MERGE (n:{entity_type} {{{prop_string}}}) RETURN n"
with self.driver.session() as session:
session.run(query, **props)
def add_relationship(self, from_name: str, rel_type: str, to_name: str,
properties: dict = None):
"""Add an edge between two nodes."""
props = properties or {}
prop_string = ""
if props:
prop_string = " {" + ", ".join(f"{k}: ${k}" for k in props) + "}"
query = (
f"MATCH (a {{name: $from_name}}), (b {{name: $to_name}}) "
f"MERGE (a)-[r:{rel_type}{prop_string}]->(b) RETURN r"
)
with self.driver.session() as session:
session.run(query, from_name=from_name, to_name=to_name, **props)
def query(self, cypher: str, params: dict = None) -> list[dict]:
"""Run a Cypher query and return results."""
with self.driver.session() as session:
result = session.run(cypher, **(params or {}))
return [record.data() for record in result]
Populating the Graph
kg = KnowledgeGraph("bolt://localhost:7687", "neo4j", "password")
# Add entities
kg.add_entity("Person", "Alice", {"role": "Senior Engineer"})
kg.add_entity("Person", "Bob", {"role": "VP of Engineering"})
kg.add_entity("Team", "Platform", {"focus": "Infrastructure"})
kg.add_entity("Service", "Auth API", {"language": "Python"})
# Add relationships
kg.add_relationship("Alice", "REPORTS_TO", "Bob")
kg.add_relationship("Bob", "MANAGES", "Platform")
kg.add_relationship("Platform", "OWNS", "Auth API")
kg.add_relationship("Alice", "CONTRIBUTES_TO", "Auth API")
Querying the Graph with an LLM
The key idea: use the LLM to convert natural language questions into graph queries.
def kg_query_with_llm(question: str, kg: KnowledgeGraph) -> str:
"""Convert a natural language question to a Cypher query and execute it."""
# Get the graph schema to help the LLM
schema = kg.query(
"CALL db.schema.visualization() YIELD nodes, relationships RETURN *"
)
# Ask the LLM to generate a Cypher query
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a Neo4j Cypher expert. Convert the user's natural "
"language question into a Cypher query. Return ONLY the "
"Cypher query, no explanation.\n\n"
f"Graph schema: {schema}"
)
},
{"role": "user", "content": question}
]
)
cypher = response.choices[0].message.content.strip()
cypher = cypher.replace("```cypher", "").replace("```", "").strip()
# Execute the query
results = kg.query(cypher)
# Generate a natural language answer
answer_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "Answer the question based on the query results."
},
{
"role": "user",
"content": (
f"Question: {question}\n"
f"Query: {cypher}\n"
f"Results: {results}"
)
}
]
)
return answer_response.choices[0].message.content
Now you can ask:
answer = kg_query_with_llm("Which services does Alice contribute to?", kg)
# → "Alice contributes to the Auth API service."
answer = kg_query_with_llm("Who manages the team that owns Auth API?", kg)
# → "Bob manages the Platform team, which owns the Auth API."
RAG would struggle with the second question because it requires traversing two relationships: Auth API → Platform → Bob. A knowledge graph handles it naturally.
Where Knowledge Graphs Struggle
1. Schema design is hard. You need to decide upfront what entities and relationships matter. Real-world data is messy and doesn't always fit clean schemas.
2. Population is expensive. Extracting entities and relationships from unstructured text (documents, emails, reports) requires NLP pipelines or manual curation.
3. Poor at free-text answers. Knowledge graphs are great at structured queries but terrible at "Summarize the key points of this document" — they don't store the original text.
4. Cypher generation can fail. The LLM might generate invalid or inefficient queries, especially for complex schemas.
Side-by-Side Comparison
| Dimension | RAG | Knowledge Graph |
|---|---|---|
| Data format | Unstructured text (docs, PDFs, web pages) | Structured entities and relationships |
| Query type | "What does the doc say about X?" | "How is X related to Y?" |
| Setup effort | Low — chunk, embed, store | High — design schema, extract entities |
| Handles updates | Re-embed changed docs | Update nodes/edges |
| Multi-hop reasoning | Poor — limited to what's in retrieved chunks | Excellent — traverse relationships |
| Free-text answers | Excellent — has the source text | Poor — only has structured facts |
| Scalability | Scales well with vector DBs | Can get complex with large graphs |
| Hallucination risk | Medium — depends on retrieval quality | Low — answers come from explicit facts |
When to Use RAG
- Your data is mostly unstructured text (documentation, articles, support tickets)
- Users ask "what does it say about..." questions
- You need to get something working quickly
- Your data changes frequently and you need to re-index fast
When to Use a Knowledge Graph
- Your data has rich relationships (org charts, supply chains, product catalogs)
- Users ask "how is X related to Y" or "what depends on X" questions
- You need precise, auditable answers (compliance, medical, legal)
- You can invest in schema design and entity extraction
The Best of Both Worlds: GraphRAG
What if you could combine the free-text understanding of RAG with the relational reasoning of Knowledge Graphs? That's GraphRAG.
The idea: build a knowledge graph from your documents, then use both vector search and graph traversal to retrieve context for the LLM.
GraphRAG Architecture
User Query
├── Vector Search → Relevant text chunks
└── Graph Query → Related entities and relationships
↓
Merged Context
↓
LLM generates answer
Building a GraphRAG Pipeline
class GraphRAG:
"""Combines vector search (RAG) with knowledge graph traversal."""
def __init__(self, collection, kg: KnowledgeGraph):
self.collection = collection # ChromaDB collection
self.kg = kg
def extract_entities(self, text: str) -> list[dict]:
"""Use an LLM to extract entities and relationships from text."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Extract entities and relationships from the text. "
"Return JSON with format: "
'{"entities": [{"name": "...", "type": "..."}], '
'"relationships": [{"from": "...", "type": "...", '
'"to": "..."}]}'
)
},
{"role": "user", "content": text}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
def index_document(self, doc_id: str, content: str, source: str):
"""Index a document in both vector store and knowledge graph."""
# 1. Vector store: chunk and embed
chunks = chunk_text(content, chunk_size=500, overlap=50)
for i, chunk in enumerate(chunks):
self.collection.add(
ids=[f"{doc_id}_chunk_{i}"],
documents=[chunk],
metadatas=[{"source": source, "doc_id": doc_id}]
)
# 2. Knowledge graph: extract and store entities
extracted = self.extract_entities(content)
for entity in extracted["entities"]:
self.kg.add_entity(entity["type"], entity["name"])
for rel in extracted["relationships"]:
self.kg.add_relationship(rel["from"], rel["type"], rel["to"])
def query(self, question: str) -> str:
"""Answer a question using both vector search and graph traversal."""
# 1. Vector search for relevant text chunks
vector_results = self.collection.query(
query_texts=[question], n_results=5
)
text_context = "\n\n".join(vector_results["documents"][0])
# 2. Extract entities from the question, then query the graph
question_entities = self.extract_entities(question)
graph_context = ""
for entity in question_entities.get("entities", []):
neighbors = self.kg.query(
"MATCH (n {name: $name})-[r]-(m) "
"RETURN n.name, type(r) AS rel, m.name LIMIT 20",
{"name": entity["name"]}
)
if neighbors:
graph_context += f"\nRelationships for {entity['name']}:\n"
for n in neighbors:
graph_context += (
f" {n['n.name']} --{n['rel']}--> {n['m.name']}\n"
)
# 3. Combine both contexts and generate answer
combined_context = (
f"=== Text Context (from documents) ===\n{text_context}\n\n"
f"=== Graph Context (relationships) ===\n{graph_context}"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Answer the question using both the text context and "
"graph context provided. The text context gives you "
"detailed information from documents. The graph context "
"shows entity relationships. Use both to give a "
"complete answer."
)
},
{
"role": "user",
"content": (
f"Context:\n{combined_context}\n\nQuestion: {question}"
)
}
]
)
return response.choices[0].message.content
Using GraphRAG
graph_rag = GraphRAG(collection, kg)
# Index your documents (this populates both vector store and graph)
graph_rag.index_document(
"doc_1",
"Alice is a senior engineer at Acme Corp. She works on the Platform "
"team led by Bob, the VP of Engineering. The Platform team owns the "
"Auth API, which handles authentication for all Acme products.",
source="company_wiki"
)
# Query — gets both text details and relationship traversal
answer = graph_rag.query(
"What is Alice's relationship to the Auth API, and what does it do?"
)
This query benefits from both approaches:
- Vector search finds the chunk describing what Auth API does ("handles authentication for all Acme products")
- Graph traversal finds the chain: Alice → Platform team → Auth API
Neither approach alone gives the complete picture.
Practical Tips
Start with RAG
Unless your data is inherently graph-structured (org charts, dependency trees, supply chains), start with plain RAG. It's simpler to build, easier to debug, and works well for most use cases.
Improve RAG Before Adding a Graph
Before reaching for a knowledge graph, try these RAG improvements:
- Better chunking — Use semantic chunking (split on paragraph/section boundaries) instead of fixed character counts
- Hybrid search — Combine vector search with keyword search (BM25) for better retrieval
- Re-ranking — Use a cross-encoder model to re-rank retrieved chunks before passing to the LLM
- Query expansion — Use the LLM to rewrite the user's question into multiple search queries
- Metadata filtering — Filter chunks by source, date, or category before vector search
# Example: Hybrid search with metadata filtering
results = collection.query(
query_texts=[question],
n_results=10,
where={"source": "product_docs"}, # Filter by metadata
)
Add a Knowledge Graph When You Hit RAG's Limits
You'll know it's time when:
- Users ask multi-hop questions that RAG can't answer
- You need to traverse relationships ("What depends on X?")
- You need precise, auditable answers grounded in explicit facts
- Your domain has clear entities and relationships
Use LLMs for Entity Extraction
Building a knowledge graph manually doesn't scale. Use LLMs to extract entities and relationships from your documents automatically. The extract_entities method in the GraphRAG example above shows the pattern — but for production, add validation and deduplication:
def deduplicate_entities(entities: list[dict]) -> list[dict]:
"""Merge duplicate entities (e.g., 'Auth API' and 'Authentication API')."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"Given these entities, identify duplicates (same entity "
"with different names) and return a deduplicated list. "
"Return JSON."
)
},
{
"role": "user",
"content": json.dumps(entities)
}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
Summary
| Approach | Best For | Start Here If... |
|---|---|---|
| RAG | Text-heavy data, "what does it say" questions | You have documents and need answers fast |
| Knowledge Graph | Relationship-heavy data, "how is X related to Y" questions | Your data has clear entities and connections |
| GraphRAG | Complex domains that need both text and relationships | RAG alone can't answer multi-hop questions |
The progression is natural: start with RAG, improve it, and add a knowledge graph when your use case demands relational reasoning. Don't over-engineer from the start — let the questions your users actually ask guide the architecture.
This post is part of the AI Agents series. Previous posts covered tool calling, ReAct agents, and deep search.