opncrafter

Tracing Memory and Context in Agent Workflows

Agents are fundamentally stateless. Every action they take requires re-injecting exactly the right amount of memoryβ€”too little, and they hallucinate. Too much, and they get confused and exceed token limits.

The Memory Hierarchy

In production systems, agent memory is usually split into three horizontal layers:

  1. The Scratchpad (Short-Term): The step-by-step history of the current execution (e.g., "I just searched the web, and here is result #1").
  2. The User Session (Mid-Term): Conversation history over the last hour.
  3. The Global State (Long-Term): Enduring facts, vector database retrievals, and user preferences.

Tracing Context Decay

"Context Decay" occurs when critical information present in step 1 gets pushed so far up the LLM's context window by step 15 that the model "forgets" it or ignores it entirely (often called the "Lost in the Middle" phenomenon).

Visualizing the Window

When tracing your agent, you must graph the size of the messages array over the duration of the workflow. If the array grows linearly with every tool call, your agent will inevitably crash or degrade in logic logic.

Strategies for Context Management

1. The Rolling Summary

Instead of appending every raw tool output to the context window, inject a "Summarizer Node."

def summarize_scratchpad(raw_scratchpad):
    # If the agent has taken more than 5 steps, compress the history
    if len(raw_scratchpad) > 5:
        summary_prompt = f"Summarize these past actions into 3 bullet points: {raw_scratchpad}"
        compressed = llm(summary_prompt)
        return compressed
    return raw_scratchpad

2. Selective Information Passing

If an agent uses a tool to scrape a 10,000-word webpage, do not inject the entire webpage into the primary agent's brain. Instead, spawn a sub-agent.

The Pattern: Main Agent asks Sub-Agent: "Read this URL and extract only the CEO's name." The Sub-Agent processes the massive context and returns a 2-word string back to the Main Agent's scratchpad.

3. Tracing Context with Graphs

Frameworks like LangGraph treat memory as a strictly defined State object. By tracing State transitions, you know exactly what variables the agent had access to at any given microsecond.

from typing import TypedDict, List

# By adhering to a strict State typing, tracing becomes trivial.
# We can dump this dict to a database at every step.
class AgentState(TypedDict):
    task_intent: str
    current_doc_chunks: List[str]
    errors_encountered: int
    final_answer: str

4. Long-Term State: The Vector DB Boundary

Short-term scratchpads are cleared when the agent terminates. For an agent to be truly useful over weeks or months, it needs a persistent memory layer. This is almost universally implemented via Vector Databases (like Pinecone, Milvus, or locally via Chroma).

Memory TypeImplementationRetention Trigger
Episodic (Session)Redis / Message HistoryKept for the duration of the conversation
Semantic (Knowledge)Vector Database (RAG)Saved when new facts are discovered
Procedural (Skills)Prompt / Tool Code updatesSaved by developers when agent makes systematic errors

Tracing the Recall: The most common failure mode in long-term memory is Retrieval Drift. An agent saves "User prefers dark mode" on Day 1. On Day 30, the user asks to switch to light mode. If your vector similarity search pulls up the Day 1 memory and ranks it higher than the Day 30 memory, the agent will refuse to change the setting. Tracing must include the timestamp and decay weight of the vector embeddings.

Conclusion

Managing memory isn't just about saving tokens; it's about curating attention. By treating memory as a typed State machine and injecting Summarizer Nodes, you prevent Context Decay and ensure your agent remains sharp on Step 50 as it was on Step 1.

Continue Reading

πŸ‘¨β€πŸ’»
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning β€” no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK