opncrafter

Feature Stores:
Personalizing RAG

Vector DBs store Documents (Knowledge). Feature Stores store Attributes (Facts).

If you want your LLM to know "User last purchased X 5 minutes ago", you don't use a Vector Search. You use a Feature Store.

1. Feast (Feature Store)

Feast bridges your offline data warehouse (Snowflake/BigQuery) and your online inference (Redis).

# feature_definitions.py

user_clicks = FeatureView(
    name="user_clicks_last_24h",
    entities=[user],
    ttl=timedelta(days=1),
    schema=[
        Field(name="click_count", dtype=Int64),
        Field(name="favorite_category", dtype=String),
    ],
    online=True, # Sync to Redis
    source=parquet_source
)

2. The RAG Workflow

The "Context" you send to an LLM should be a mix of Unstructured Data (Vector) and Structured Data (Feature Store).

Step 1: Retrieve User Features from Feast (5ms).

user_ctx = feast.get_online_features(["user:123"])

Step 2: Use features to filter Vector Search.

docs = chroma.query("shoes", filter={"category": user_ctx.favorite_category})

Step 3: Generate.

3. Why Vector Databases Alone Aren't Enough

Consider building a personalized AI shopping assistant. Your vector database contains thousands of product descriptions. When a user asks "What should I buy?", a pure vector RAG system retrieves generic popular products—the same results for everyone. But customers have radically different contexts:

  • User A bought running shoes last week and is training for a marathon
  • User B browsed winter coats for 20 minutes yesterday but didn't buy
  • User C has a $500 budget limit set in their profile

None of this personalization metadata lives in a vector database. It lives in event streams (Kafka), data warehouses (Snowflake), or real-time databases (Redis). A Feature Store is the bridge that makes this data available at inference time in under 10ms.

4. Understanding Feature Store Architecture

A Feature Store has two critical layers:

Offline Store

Historical feature data in a data warehouse (Snowflake, BigQuery, Parquet). Used for training ML models and computing batch features. Latency: minutes to hours. Examples: "30-day purchase history", "lifetime spend".

Online Store

Recent feature data synced to a low-latency store (Redis, DynamoDB). Used for real-time inference. Latency: under 10ms. Examples: "viewed in last hour", "current cart total".

5. Full Feast Implementation Example

Define Your Features

# feature_definitions.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, String, Int64
from datetime import timedelta

# Define the entity (the "key" for lookups)
user = Entity(name="user_id", join_keys=["user_id"])

# Define features from your data source
user_behavior = FeatureView(
    name="user_behavior",
    entities=[user],
    ttl=timedelta(days=30),
    schema=[
        Field(name="favorite_category", dtype=String),
        Field(name="avg_order_value", dtype=Float32),
        Field(name="total_purchases_30d", dtype=Int64),
        Field(name="last_viewed_brand", dtype=String),
        Field(name="price_sensitivity_score", dtype=Float32),  # 0-1, higher = more price sensitive
    ],
    online=True,  # Sync to Redis for real-time access
    source=FileSource(path="s3://my-bucket/user-features.parquet")
)

Materialize Features to Online Store

from feast import FeatureStore
from datetime import datetime, timedelta

store = FeatureStore(repo_path=".")

# Run this daily to sync offline → online store
store.materialize(
    start_date=datetime.now() - timedelta(days=30),
    end_date=datetime.now(),
)
# This copies features from S3 → Redis so they're available in <10ms

Real-Time Personalized RAG Query

import chromadb

chroma = chromadb.Client()
collection = chroma.get_collection("products")

async def personalized_rag(user_id, query):
    # Step 1: Fetch user features in real-time (~5ms)
    user_features = store.get_online_features(
        features=[
            "user_behavior:favorite_category",
            "user_behavior:avg_order_value",
            "user_behavior:price_sensitivity_score",
        ],
        entity_rows=[{"user_id": user_id}]
    ).to_dict()
    
    fav_category = user_features["favorite_category"][0]
    max_price = user_features["avg_order_value"][0] * 1.5
    price_sensitive = user_features["price_sensitivity_score"][0] > 0.7
    
    # Step 2: Use features to filter vector search
    vector_filter = {"category": fav_category}
    docs = collection.query(query_texts=[query], n_results=5, where=vector_filter)
    
    # Step 3: Build personalized context and prompt
    price_note = "emphasize deals and value" if price_sensitive else "emphasize quality"
    
    context_str = "User prefers: " + fav_category + ". " + price_note
    prompt = context_str + "\n\nDocs: " + str(docs["documents"])
    prompt += "\n\nQuestion: " + query
    
    return llm.complete(prompt)

6. Real-World Use Cases

E-Commerce Product Recommendations

Online retailers combine vector search (product catalog embeddings) with feature store data (purchase history, browsing sessions, cart activity) to create truly personalized recommendations. Without feature stores, every user sees the same "popular products" at the same price ranges. With feature stores, the system knows that User A always buys premium brands and recently viewed trail running shoes—dramatically improving conversion rates.

Financial Services Advisory

Fintech companies use feature stores in RAG systems to personalize financial advice. Features like "risk tolerance score", "current portfolio allocation", "life stage" (student, young professional, pre-retirement) are pulled from CRM systems and used to filter and rank relevant financial guidance documents. A conservative retiree and an aggressive day trader asking the same question should receive very different responses.

Healthcare Patient Triage

Hospital systems retrieve patient-specific features (current medications, allergies, recent lab values, appointment history) from clinical systems via feature stores and inject them into RAG prompts. When a patient asks about a medication interaction, the system can immediately cross-reference against their actual medication list rather than giving generic advice.

7. Alternatives to Feast

ToolTypeBest For
FeastOpen SourceTeams with existing ML infrastructure
TectonManaged SaaSEnterprises wanting managed feature platform
HopsworksOpen Source + EnterpriseTeams wanting MLOps + feature store together
RedisCache/DBSimple real-time features without offline sync
DynamoDBNoSQL DBAWS-native feature serving for simple use cases

Frequently Asked Questions

When should I use a feature store vs just querying my database directly?

Use a feature store when: (1) you need sub-10ms latency and your database query takes 50-200ms, (2) you need to serve features to multiple AI models/services consistently, (3) you need to maintain consistency between training-time features and inference-time features. For simple use cases with a single application, a direct Redis or database query is sufficient.

How do I handle users with no history (cold start)?

For new users with no feature data, fall back to population-level defaults. Collect a few onboarding signals (category preferences, budget range) and use those as initial features. After 2-3 interactions, you'll have enough behavior data to personalize. Most feature stores support "default" feature values for missing entity keys.

What's the latency overhead of adding a feature store to RAG?

A properly configured feature store adds 5-15ms to your request latency (Redis lookup). You can run this in parallel with your vector search to avoid sequential latency. Total overhead is typically under 10ms when parallelized—well worth the personalization gains.

Next Steps

  • Start Simple with Redis: Before using Feast, prove the concept by storing 3-5 user features in Redis and injecting them into your RAG prompt. Measure the click-through rate improvement.
  • Identify Your Most Valuable Features: Survey your team: "What user data, if available at inference time, would most improve response quality?" Those are your first Feature Views.
  • Install Feast: pip install feast[redis] and work through their quickstart to get familiar with the offline/online sync pattern.
  • Parallelize Feature Fetching: Use asyncio.gather() to fetch user features and vector search results simultaneously, minimizing latency overhead.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK