opncrafter
⏱ 10–15 min readπŸŽ“ Intermediate β†’ AdvancedUpdated Apr 2026

Tracing Memory and Context in Agent Workflows

Agents are fundamentally stateless. Every action they take requires re-injecting exactly the right amount of memoryβ€”too little, and they hallucinate. Too much, and they get confused and exceed token limits.

The Memory Hierarchy

In production systems, agent memory is usually split into three horizontal layers:

  1. The Scratchpad (Short-Term): The step-by-step history of the current execution (e.g., "I just searched the web, and here is result #1").
  2. The User Session (Mid-Term): Conversation history over the last hour.
  3. The Global State (Long-Term): Enduring facts, vector database retrievals, and user preferences.

Tracing Context Decay

"Context Decay" occurs when critical information present in step 1 gets pushed so far up the LLM's context window by step 15 that the model "forgets" it or ignores it entirely (often called the "Lost in the Middle" phenomenon).

Visualizing the Window

When tracing your agent, you must graph the size of the messages array over the duration of the workflow. If the array grows linearly with every tool call, your agent will inevitably crash or degrade in logic logic.

Strategies for Context Management

1. The Rolling Summary

Instead of appending every raw tool output to the context window, inject a "Summarizer Node."

def summarize_scratchpad(raw_scratchpad):
    # If the agent has taken more than 5 steps, compress the history
    if len(raw_scratchpad) > 5:
        summary_prompt = f"Summarize these past actions into 3 bullet points: {raw_scratchpad}"
        compressed = llm(summary_prompt)
        return compressed
    return raw_scratchpad

2. Selective Information Passing

If an agent uses a tool to scrape a 10,000-word webpage, do not inject the entire webpage into the primary agent's brain. Instead, spawn a sub-agent.

The Pattern: Main Agent asks Sub-Agent: "Read this URL and extract only the CEO's name." The Sub-Agent processes the massive context and returns a 2-word string back to the Main Agent's scratchpad.

3. Tracing Context with Graphs

Frameworks like LangGraph treat memory as a strictly defined State object. By tracing State transitions, you know exactly what variables the agent had access to at any given microsecond.

from typing import TypedDict, List

# By adhering to a strict State typing, tracing becomes trivial.
# We can dump this dict to a database at every step.
class AgentState(TypedDict):
    task_intent: str
    current_doc_chunks: List[str]
    errors_encountered: int
    final_answer: str

4. Long-Term State: The Vector DB Boundary

Short-term scratchpads are cleared when the agent terminates. For an agent to be truly useful over weeks or months, it needs a persistent memory layer. This is almost universally implemented via Vector Databases (like Pinecone, Milvus, or locally via Chroma).

Memory TypeImplementationRetention Trigger
Episodic (Session)Redis / Message HistoryKept for the duration of the conversation
Semantic (Knowledge)Vector Database (RAG)Saved when new facts are discovered
Procedural (Skills)Prompt / Tool Code updatesSaved by developers when agent makes systematic errors

Tracing the Recall: The most common failure mode in long-term memory is Retrieval Drift. An agent saves "User prefers dark mode" on Day 1. On Day 30, the user asks to switch to light mode. If your vector similarity search pulls up the Day 1 memory and ranks it higher than the Day 30 memory, the agent will refuse to change the setting. Tracing must include the timestamp and decay weight of the vector embeddings.

Detecting Retrieval Drift in Code

One practical fix is to include a timestamp field in every vector document and apply a time-decay penalty during retrieval:

import time
import math

def time_decay_score(base_score: float, created_at: float, half_life_days: int = 30) -> float:
    """
    Reduces the retrieval score of older memories.
    A memory created 30 days ago will have half the score of a fresh one.
    """
    days_old = (time.time() - created_at) / 86400
    decay_factor = math.exp(-0.693 * days_old / half_life_days)
    return base_score * decay_factor

# Example: Day-1 memory has base_score=0.91 but is 30 days old
day1_score = time_decay_score(0.91, created_at=time.time() - (30 * 86400))
# Result: 0.455 β€” the fresh Day-30 memory (score 0.74) now wins

5. Instrumenting Memory with LangSmith

LangSmith is LangChain's observability platform and provides first-class support for tracing memory operations across multi-step agent workflows. By wrapping your agent with a @traceable decorator, every memory read and write is captured as a named span.

from langsmith import traceable
from langchain_core.messages import HumanMessage

@traceable(name="agent_memory_read", run_type="retriever")
def retrieve_user_prefs(user_id: str, query: str) -> list:
    """
    All vector store queries inside this function appear as
    a child span in LangSmith with latency + token counts.
    """
    results = vector_store.similarity_search(
        query=query,
        filter={"user_id": user_id},
        k=5
    )
    return results

In the LangSmith trace view, you can visualize exactly which memory chunks were retrieved at each step, their similarity scores, and whether the agent actually used the retrieved context in its final answer. This is the most powerful debugging tool available for multi-step memory pipelines.

6. Memory Implementation Comparison

Memory PatternBest ForTracing ToolKey Risk
In-context WindowShort single-session tasksToken counterContext overflow
Redis Message HistoryMulti-turn chat sessionsLangSmith spansStale session data
Vector DB (RAG)Long-term knowledge retrievalRetrieval spans + scoresRetrieval Drift
LangGraph StateTyped multi-step pipelinesState diff at each nodeState explosion
Persistent CheckpointerLong-horizon agentic tasksCheckpoint diffsStorage cost at scale

Production Memory Tracing Checklist

  1. Log context size at every step β€” set an alert if the messages array exceeds 80% of the model's context limit.
  2. Inject timestamps into all vector documents β€” enables time-decay scoring and Retrieval Drift detection.
  3. Trace all similarity_search calls β€” record the top-k scores, not just the returned text.
  4. Use LangSmith or Langfuse for end-to-end span visibility across memory reads, tool calls, and model responses.
  5. Set a Summarizer Node threshold β€” trigger compression when the scratchpad exceeds 5 steps or 3,000 tokens.
  6. Test Retrieval Drift scenarios β€” seed your test suite with conflicting user preferences at different timestamps.

Conclusion

Managing memory isn't just about saving tokens; it's about curating attention. By treating memory as a typed State machine and injecting Summarizer Nodes, you prevent Context Decay and ensure your agent remains sharp on Step 50 as it was on Step 1.

Continue Reading

πŸ‘¨β€πŸ’»
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning β€” no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK