Managing Latency with Suspense
Jan 2, 2026 • 15 min read
AI is slow. A complex reasoning chain (like search + analysis) can take 10+ seconds. In traditional web dev, 10s latency is a death sentence. In AI dev, we mask it with Skeletons.
1. The Psychology of Perceived Performance
Users are willing to wait if they know what is happening. There are three levels of waiting states:
Blank Screen
Panic. "Is it broken?" Users leave after 2s.
Spinner
Better. "It's working." Users wait 5-8s.
Skeleton
Best. "It's loading the chart." Users wait 10s+.
2. Implementation: The Generator Pattern
Because we use Generator functions in `render()`, implementing a skeleton is essentially free. We yield the skeleton synchronously before starting the async work.
generate: async function* ({ query }) {
// 1. YIELD IMMEDIATELY (< 50ms)
// This flushes a chunk to the browser instantly.
yield <SearchResultsSkeleton query={query} />;
// 2. DO THE WORK (2000ms)
// The user stares at the skeleton, reading the "query" text.
const results = await tavily.search(query);
// 3. RETURN FINAL UI
// React reconciles the DOM, replacing Skeleton with Results.
return <SearchResults data={results} />;
}3. Code: Building a Shimmer Effect
A static gray box is boring. We want a "shimmer" effect to indicate activity. Tailwind CSS has a built-in utility: animate-pulse.
// StockSkeleton.js
export function StockSkeleton() {
return (
<div className="rounded-lg border border-white/10 p-4 w-full h-[300px] animate-pulse bg-tertiary">
{/* Header Area */}
<div className="flex justify-between items-center mb-4">
<div className="h-6 w-32 bg-white/10 rounded"></div>
<div className="h-6 w-16 bg-white/10 rounded"></div>
</div>
{/* Chart Area (Big Block) */}
<div className="h-[200px] w-full bg-white/5 rounded flex items-end gap-2 p-2">
{/* Fake bars */}
<div className="h-[40%] w-full bg-white/5 rounded-t"></div>
<div className="h-[70%] w-full bg-white/5 rounded-t"></div>
<div className="h-[50%] w-full bg-white/5 rounded-t"></div>
<div className="h-[80%] w-full bg-white/5 rounded-t"></div>
</div>
</div>
);
}4. The Enemy: CLS (Cumulative Layout Shift)
Avoid the "Jump"
If your Skeleton is 100px tall, but your loaded content is 300px tall, the chat interface will "jump" when it loads. This is extremely jarring in a chat interface where the user might be scrolling up to read history.
Rule: Your Skeleton must be the EXACT same height as your final component.
5. Granular Suspense
Don't wrap your entire chat in one Suspense boundary. If the LLM is streaming text AND generating a chart, the text should appear while the chart is loading.
The Vercel AI SDK handles this automatically if you stream the text part separately from the tool call part.
Conclusion of Part 1
We have covered the infrastructure: RSCs, the Vercel AI SDK, Tool Rendering, and Loading States. You now have the foundation to build any Generative UI. In the next module, we will start building specific sophisticated components: Interactive Charts.