Feature Stores:
Personalizing RAG
Vector DBs store Documents (Knowledge). Feature Stores store Attributes (Facts).
If you want your LLM to know "User last purchased X 5 minutes ago", you don't use a Vector Search. You use a Feature Store.
1. Feast (Feature Store)
Feast bridges your offline data warehouse (Snowflake/BigQuery) and your online inference (Redis).
# feature_definitions.py
user_clicks = FeatureView(
name="user_clicks_last_24h",
entities=[user],
ttl=timedelta(days=1),
schema=[
Field(name="click_count", dtype=Int64),
Field(name="favorite_category", dtype=String),
],
online=True, # Sync to Redis
source=parquet_source
)2. The RAG Workflow
The "Context" you send to an LLM should be a mix of Unstructured Data (Vector) and Structured Data (Feature Store).
Step 1: Retrieve User Features from Feast (5ms).
user_ctx = feast.get_online_features(["user:123"])
Step 2: Use features to filter Vector Search.
docs = chroma.query("shoes", filter={"category": user_ctx.favorite_category})
Step 3: Generate.
3. Why Vector Databases Alone Aren't Enough
Consider building a personalized AI shopping assistant. Your vector database contains thousands of product descriptions. When a user asks "What should I buy?", a pure vector RAG system retrieves generic popular products—the same results for everyone. But customers have radically different contexts:
- User A bought running shoes last week and is training for a marathon
- User B browsed winter coats for 20 minutes yesterday but didn't buy
- User C has a $500 budget limit set in their profile
None of this personalization metadata lives in a vector database. It lives in event streams (Kafka), data warehouses (Snowflake), or real-time databases (Redis). A Feature Store is the bridge that makes this data available at inference time in under 10ms.
4. Understanding Feature Store Architecture
A Feature Store has two critical layers:
Offline Store
Historical feature data in a data warehouse (Snowflake, BigQuery, Parquet). Used for training ML models and computing batch features. Latency: minutes to hours. Examples: "30-day purchase history", "lifetime spend".
Online Store
Recent feature data synced to a low-latency store (Redis, DynamoDB). Used for real-time inference. Latency: under 10ms. Examples: "viewed in last hour", "current cart total".
5. Full Feast Implementation Example
Define Your Features
# feature_definitions.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, String, Int64
from datetime import timedelta
# Define the entity (the "key" for lookups)
user = Entity(name="user_id", join_keys=["user_id"])
# Define features from your data source
user_behavior = FeatureView(
name="user_behavior",
entities=[user],
ttl=timedelta(days=30),
schema=[
Field(name="favorite_category", dtype=String),
Field(name="avg_order_value", dtype=Float32),
Field(name="total_purchases_30d", dtype=Int64),
Field(name="last_viewed_brand", dtype=String),
Field(name="price_sensitivity_score", dtype=Float32), # 0-1, higher = more price sensitive
],
online=True, # Sync to Redis for real-time access
source=FileSource(path="s3://my-bucket/user-features.parquet")
)Materialize Features to Online Store
from feast import FeatureStore
from datetime import datetime, timedelta
store = FeatureStore(repo_path=".")
# Run this daily to sync offline → online store
store.materialize(
start_date=datetime.now() - timedelta(days=30),
end_date=datetime.now(),
)
# This copies features from S3 → Redis so they're available in <10msReal-Time Personalized RAG Query
import chromadb
chroma = chromadb.Client()
collection = chroma.get_collection("products")
async def personalized_rag(user_id, query):
# Step 1: Fetch user features in real-time (~5ms)
user_features = store.get_online_features(
features=[
"user_behavior:favorite_category",
"user_behavior:avg_order_value",
"user_behavior:price_sensitivity_score",
],
entity_rows=[{"user_id": user_id}]
).to_dict()
fav_category = user_features["favorite_category"][0]
max_price = user_features["avg_order_value"][0] * 1.5
price_sensitive = user_features["price_sensitivity_score"][0] > 0.7
# Step 2: Use features to filter vector search
vector_filter = {"category": fav_category}
docs = collection.query(query_texts=[query], n_results=5, where=vector_filter)
# Step 3: Build personalized context and prompt
price_note = "emphasize deals and value" if price_sensitive else "emphasize quality"
context_str = "User prefers: " + fav_category + ". " + price_note
prompt = context_str + "\n\nDocs: " + str(docs["documents"])
prompt += "\n\nQuestion: " + query
return llm.complete(prompt)6. Real-World Use Cases
E-Commerce Product Recommendations
Online retailers combine vector search (product catalog embeddings) with feature store data (purchase history, browsing sessions, cart activity) to create truly personalized recommendations. Without feature stores, every user sees the same "popular products" at the same price ranges. With feature stores, the system knows that User A always buys premium brands and recently viewed trail running shoes—dramatically improving conversion rates.
Financial Services Advisory
Fintech companies use feature stores in RAG systems to personalize financial advice. Features like "risk tolerance score", "current portfolio allocation", "life stage" (student, young professional, pre-retirement) are pulled from CRM systems and used to filter and rank relevant financial guidance documents. A conservative retiree and an aggressive day trader asking the same question should receive very different responses.
Healthcare Patient Triage
Hospital systems retrieve patient-specific features (current medications, allergies, recent lab values, appointment history) from clinical systems via feature stores and inject them into RAG prompts. When a patient asks about a medication interaction, the system can immediately cross-reference against their actual medication list rather than giving generic advice.
7. Alternatives to Feast
| Tool | Type | Best For |
|---|---|---|
| Feast | Open Source | Teams with existing ML infrastructure |
| Tecton | Managed SaaS | Enterprises wanting managed feature platform |
| Hopsworks | Open Source + Enterprise | Teams wanting MLOps + feature store together |
| Redis | Cache/DB | Simple real-time features without offline sync |
| DynamoDB | NoSQL DB | AWS-native feature serving for simple use cases |
Frequently Asked Questions
When should I use a feature store vs just querying my database directly?
Use a feature store when: (1) you need sub-10ms latency and your database query takes 50-200ms, (2) you need to serve features to multiple AI models/services consistently, (3) you need to maintain consistency between training-time features and inference-time features. For simple use cases with a single application, a direct Redis or database query is sufficient.
How do I handle users with no history (cold start)?
For new users with no feature data, fall back to population-level defaults. Collect a few onboarding signals (category preferences, budget range) and use those as initial features. After 2-3 interactions, you'll have enough behavior data to personalize. Most feature stores support "default" feature values for missing entity keys.
What's the latency overhead of adding a feature store to RAG?
A properly configured feature store adds 5-15ms to your request latency (Redis lookup). You can run this in parallel with your vector search to avoid sequential latency. Total overhead is typically under 10ms when parallelized—well worth the personalization gains.
Next Steps
- Start Simple with Redis: Before using Feast, prove the concept by storing 3-5 user features in Redis and injecting them into your RAG prompt. Measure the click-through rate improvement.
- Identify Your Most Valuable Features: Survey your team: "What user data, if available at inference time, would most improve response quality?" Those are your first Feature Views.
- Install Feast:
pip install feast[redis]and work through their quickstart to get familiar with the offline/online sync pattern. - Parallelize Feature Fetching: Use
asyncio.gather()to fetch user features and vector search results simultaneously, minimizing latency overhead.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.