How It Works
Design Philosophy
Vector Graph RAG is built on three key principles:
1. No Graph Database
Traditional Graph RAG systems store knowledge in a graph database (Neo4j, ArangoDB, etc.) and use graph traversal queries (Cypher, Gremlin) to retrieve relevant subgraphs. This adds operational complexity: another database to deploy, a query language to learn, schema to maintain.
We store the entire knowledge graph — entities, relations, and passages — as vectors in Milvus. Retrieval becomes vector similarity search, which is simple, scalable, and requires no additional infrastructure.
2. Single-Pass LLM Reranking
Many RAG systems use iterative, agentic retrieval — the LLM decides what to retrieve next, reflects on results, and repeats. For example:
- IRCoT (Interleaving Retrieval with Chain-of-Thought) alternates between retrieval and reasoning over multiple rounds
- Self-RAG uses the LLM to critique and re-retrieve documents
- Agentic RAG gives the LLM tools to search iteratively
These approaches are powerful but expensive — each iteration costs an LLM call, adding latency and cost.
Vector Graph RAG uses a single LLM reranking pass. After vector search and subgraph expansion produce candidate relations, the LLM scores them once. This is sufficient because the vector search + subgraph expansion already provides high-quality candidates, and a single reranking step can effectively filter the best results.
3. Knowledge-Intensive Domains
Vector Graph RAG is especially effective for knowledge-intensive content — documents where dense factual relationships are the core value:
| Domain | Why Graph RAG Helps |
|---|---|
| Legal | Statutes reference other statutes, precedents cite precedents — graph captures these cross-references |
| Finance | Company relationships, ownership chains, transaction flows form natural graphs |
| Medical | Drug interactions, symptom-disease-treatment pathways are inherently relational |
| Literature | Character relationships, plot connections, thematic links across chapters |
| Academic | Citation networks, concept dependencies, methodology chains |
In these domains, naive RAG often fails because the answer requires connecting facts across multiple documents. The knowledge graph captures these connections explicitly.
Architecture
Indexing Pipeline
flowchart LR
A[Documents] --> B[Triplet Extraction\n**LLM**]
B --> C[Entities + Relations]
C --> D[Embedding]
D --> E[Milvus]
- Triplet Extraction — An LLM extracts
(subject, predicate, object)triplets from each document. - Entity & Relation Storage — Entities and relations are stored as vectors in Milvus collections.
- Embedding — All text is embedded for vector similarity search.
Query Pipeline
flowchart LR
A[Question] --> B[Entity Extraction]
B --> C[Vector Search]
C --> D[Subgraph Expansion]
D --> E[LLM Reranking]
E --> F[Answer]
- Entity Extraction — Extract key entities from the user's question.
- Vector Search — Find similar entities and relations in Milvus.
- Subgraph Expansion — Collect candidate relations by expanding around matched entities.
- LLM Reranking — Use an LLM to score and filter the most relevant relations (single pass).
- Answer Generation — Generate the final answer from the selected context.
Worked Example
Indexing
Given the passage: "Einstein developed the theory of relativity at Princeton."
- Entities:
Einstein,theory of relativity,Princeton - Relations:
(Einstein, developed, theory of relativity)(Einstein, worked at, Princeton)
Querying
For the question: "What did Einstein develop?"
flowchart TD
Q["What did Einstein develop?"] --> E1["Extract entity: **Einstein**"]
E1 --> VS["Vector search → similar entities"]
VS --> SE["Subgraph expansion → candidate relations"]
SE --> R1["(Einstein, developed, theory of relativity)"]
SE --> R2["(Einstein, worked at, Princeton)"]
R1 --> LLM["LLM reranking (single pass)"]
R2 --> LLM
LLM --> A["Einstein developed the theory of relativity."]
- Extract entity:
Einstein - Vector search finds similar entities and relations
- Subgraph expansion collects candidate relations
- LLM reranking selects
(Einstein, developed, theory of relativity)— one call, no iteration - Generate answer: "Einstein developed the theory of relativity."
Comparison with Other Approaches
| Approach | Graph DB | LLM Calls per Query | Iterative | Complexity |
|---|---|---|---|---|
| Naive RAG | No | 1 (generation) | No | Low |
| IRCoT | No | Multiple (retrieve + reason loops) | Yes | High |
| HippoRAG | No | 1-2 | No | Medium |
| Microsoft GraphRAG | Yes (Neo4j) | Multiple | Yes | High |
| Vector Graph RAG | No | 2 (rerank + generation) | No | Low |