⏱ 9–13 min read🎓 IntermediateUpdated Apr 2026

LlamaIndex: Mastering Your Data

Dec 29, 2025 • 22 min read

Naive RAG fails in production. Chunking a document into fixed 1,000-character chunks and retrieving the top-5 by embedding similarity is how developers start — and it produces mediocre results that frustrate users. The retrieved chunks lack context (a chunk starting mid-sentence doesn't tell the model what came before), miss related information across chunks, and struggle with tabular data. LlamaIndex was built specifically to solve these problems, providing a suite of advanced retrieval strategies that dramatically improve answer quality for complex enterprise document search.

1. Why Naive RAG Fails

Fixed-size text chunking breaks documents at arbitrary boundaries, creating chunks that:

Lose context: "The figure below shows..." doesn't work if the figure is in the next chunk
Miss relationships: A quarterly report's summary chunk doesn't contain the detailed numbers, and vice versa
Break sentences: 1,000 characters often cuts mid-sentence, creating nonsensical embeddings
Destroy tables: CSV-like content chunked arbitrarily loses row/column relationships

LlamaIndex's advanced indexing strategies maintain document structure and provide retrieval mechanisms tuned for different query types.

2. Sentence Window Retrieval: Small-to-Big

pip install llama-index llama-index-embeddings-openai llama-index-llms-openai

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure models
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# === SENTENCE WINDOW RETRIEVAL ===
# Strategy: Index individual sentences (precise retrieval)
# BUT: When you find a match, return a window of surrounding sentences to the LLM

documents = SimpleDirectoryReader("./docs").load_data()

# SentenceWindowNodeParser creates sentence-level nodes
# Each node stores its surrounding context (5 sentences each side) in metadata
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=5,           # Include 5 sentences before AND after the matched sentence
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
nodes = node_parser.get_nodes_from_documents(documents)

# Build vector index over individual sentences
index = VectorStoreIndex(nodes)

# MetadataReplacementPostProcessor: swap the sentence for its surrounding window
# during the LLM synthesis step (the LLM gets rich context, retrieval is precise)
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")

query_engine = index.as_query_engine(
    similarity_top_k=3,          # Retrieve top 3 matching SENTENCES
    node_postprocessors=[postproc],  # Expand each to its 11-sentence window for LLM
)

response = query_engine.query("What is the Q3 revenue growth rate?")
print(response)
# Finds the exact sentence mentioning Q3 revenue, then gives LLM the surrounding
# 11 sentences for context — much better than a random 1000-char chunk!

3. Auto-Merging Retrieval: Hierarchical Chunking

from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.core import StorageContext

# Hierarchical chunking: 
# Level 1 (Parent): Large chunks (~2048 tokens) — indexed for context
# Level 2 (Leaf): Small chunks (~512 tokens) — searched for precise retrieval
# Level 3 (Optional): Sentence level for maximum precision

node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128],  # Parent → Child → Leaf hierarchy
)

nodes = node_parser.get_nodes_from_documents(documents)

# Store ALL nodes (parents + children) in docstore
docstore = SimpleDocumentStore()
docstore.add_documents(nodes)

storage_context = StorageContext.from_defaults(docstore=docstore)

# Build index over ONLY leaf nodes (smallest chunks)
leaf_nodes = [n for n in nodes if n.metadata.get("is_leaf", False)]
index = VectorStoreIndex(leaf_nodes, storage_context=storage_context)

# AutoMergingRetriever: retrieve leaf nodes, but if >X% of a parent's children
# are retrieved, swap them for the parent (more coherent context for LLM)
base_retriever = index.as_retriever(similarity_top_k=6)
retriever = AutoMergingRetriever(
    base_retriever,
    storage_context,
    verbose=True,
    simple_ratio_thresh=0.4,  # Merge if 40%+ of parent's children are retrieved
)

from llama_index.core.query_engine import RetrieverQueryEngine
query_engine = RetrieverQueryEngine(retriever=retriever)
response = query_engine.query("What are the risks mentioned in the contract?")

4. Router Engine: Multi-Source RAG

from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

# Build separate indices for each data source
product_docs_engine = VectorStoreIndex.from_documents(
    SimpleDirectoryReader("./product_docs").load_data()
).as_query_engine()

contracts_engine = VectorStoreIndex.from_documents(
    SimpleDirectoryReader("./contracts").load_data()
).as_query_engine()

# SQL engine for database queries
from llama_index.core import SQLDatabase
from sqlalchemy import create_engine as create_sql_engine
from llama_index.core.query_engine import NLSQLTableQueryEngine

sql_engine = create_sql_engine("postgresql://user:pass@localhost/inventory")
sql_database = SQLDatabase(sql_engine, include_tables=["inventory", "orders"])
sql_query_engine = NLSQLTableQueryEngine(sql_database=sql_database)

# Wrap each engine with a description the router uses to route questions
product_tool = QueryEngineTool.from_defaults(
    query_engine=product_docs_engine,
    name="product_documentation",
    description="Answers questions about product features, specifications, and user guides."
)
contract_tool = QueryEngineTool.from_defaults(
    query_engine=contracts_engine,
    name="contracts",
    description="Answers questions about legal agreements, payment terms, and SLAs."
)
inventory_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine,
    name="inventory_database",
    description="Queries real-time inventory levels, order status, and stock availability."
)

# Router selects the best tool for each query using LLM reasoning
router = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[product_tool, contract_tool, inventory_tool],
    verbose=True,  # Shows which engine was selected and why
)

# "Why was I charged a late fee?" → Routes to contracts
# "How many units of SKU-123 are in stock?" → Routes to SQL database  
# "How do I configure the API authentication?" → Routes to product docs
response = router.query("What's the refund policy for enterprise customers?")
print(response)  # Selected: contracts → accurate answer from the right source

Frequently Asked Questions

When should I use LlamaIndex vs LangChain for RAG?

LlamaIndex excels at data ingestion and complex retrieval: hierarchical indexing, multi-source routing, structured data integration, and the widest variety of advanced retrieval strategies. Use LlamaIndex when your primary challenge is improving retrieval quality from complex documents. LangChain excels at orchestration and agent workflows: chaining multiple LLM calls, building complex agent loops, and integrating with diverse external services. Many production systems use both: LlamaIndex for data/retrieval layers and LangChain for the agent orchestration layer above it.

How does LlamaParse compare to Unstructured for PDF parsing?

LlamaParse is a cloud-only service from the LlamaIndex team, optimized for parsing documents into Markdown that LlamaIndex retrieval pipelines can work with efficiently. It handles complex tables, multi-column layout, and figures well. Unstructured is open source and can run locally (critical for data privacy), handles more document types (PPT, MSG, HTML), and provides more granular element categorization. For maximum privacy: Unstructured. For easiest LlamaIndex integration with complex PDFs: LlamaParse.

Conclusion

LlamaIndex's advanced retrieval strategies address the fundamental limitations of naive RAG. Sentence Window Retrieval achieves precise search at the sentence level while providing rich surrounding context to the LLM. Auto-Merging Retrieval uses hierarchical document structure to return coherent large sections when multiple related chunks are retrieved. Router Engines intelligently direct questions to the right data source — vector store, SQL database, or API — based on question type. Together, these strategies can more than double the answer quality of a naive chunk-and-embed RAG pipeline on complex enterprise documents.

Continue Reading

👨‍💻

Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK

More about me →GitHub ↗Contact