Tracing Memory and Context in Agent Workflows
Agents are fundamentally stateless. Every action they take requires re-injecting exactly the right amount of memoryβtoo little, and they hallucinate. Too much, and they get confused and exceed token limits.
The Memory Hierarchy
In production systems, agent memory is usually split into three horizontal layers:
- The Scratchpad (Short-Term): The step-by-step history of the current execution (e.g., "I just searched the web, and here is result #1").
- The User Session (Mid-Term): Conversation history over the last hour.
- The Global State (Long-Term): Enduring facts, vector database retrievals, and user preferences.
Tracing Context Decay
"Context Decay" occurs when critical information present in step 1 gets pushed so far up the LLM's context window by step 15 that the model "forgets" it or ignores it entirely (often called the "Lost in the Middle" phenomenon).
Visualizing the Window
When tracing your agent, you must graph the size of the messages array over the duration of the workflow. If the array grows linearly with every tool call, your agent will inevitably crash or degrade in logic logic.
Strategies for Context Management
1. The Rolling Summary
Instead of appending every raw tool output to the context window, inject a "Summarizer Node."
def summarize_scratchpad(raw_scratchpad):
# If the agent has taken more than 5 steps, compress the history
if len(raw_scratchpad) > 5:
summary_prompt = f"Summarize these past actions into 3 bullet points: {raw_scratchpad}"
compressed = llm(summary_prompt)
return compressed
return raw_scratchpad2. Selective Information Passing
If an agent uses a tool to scrape a 10,000-word webpage, do not inject the entire webpage into the primary agent's brain. Instead, spawn a sub-agent.
The Pattern: Main Agent asks Sub-Agent: "Read this URL and extract only the CEO's name." The Sub-Agent processes the massive context and returns a 2-word string back to the Main Agent's scratchpad.
3. Tracing Context with Graphs
Frameworks like LangGraph treat memory as a strictly defined State object. By tracing State transitions, you know exactly what variables the agent had access to at any given microsecond.
from typing import TypedDict, List
# By adhering to a strict State typing, tracing becomes trivial.
# We can dump this dict to a database at every step.
class AgentState(TypedDict):
task_intent: str
current_doc_chunks: List[str]
errors_encountered: int
final_answer: str4. Long-Term State: The Vector DB Boundary
Short-term scratchpads are cleared when the agent terminates. For an agent to be truly useful over weeks or months, it needs a persistent memory layer. This is almost universally implemented via Vector Databases (like Pinecone, Milvus, or locally via Chroma).
| Memory Type | Implementation | Retention Trigger |
|---|---|---|
| Episodic (Session) | Redis / Message History | Kept for the duration of the conversation |
| Semantic (Knowledge) | Vector Database (RAG) | Saved when new facts are discovered |
| Procedural (Skills) | Prompt / Tool Code updates | Saved by developers when agent makes systematic errors |
Tracing the Recall: The most common failure mode in long-term memory is Retrieval Drift. An agent saves "User prefers dark mode" on Day 1. On Day 30, the user asks to switch to light mode. If your vector similarity search pulls up the Day 1 memory and ranks it higher than the Day 30 memory, the agent will refuse to change the setting. Tracing must include the timestamp and decay weight of the vector embeddings.
Conclusion
Managing memory isn't just about saving tokens; it's about curating attention. By treating memory as a typed State machine and injecting Summarizer Nodes, you prevent Context Decay and ensure your agent remains sharp on Step 50 as it was on Step 1.