opncrafter
Module 2 of 10: Generative UI

React Server Components & Streaming UI

React Server Components & Streaming UI

Jan 2, 2026 • 20 min read

Generative UI requires a rendering model that sends content to the browser as it's produced, not after everything is ready. A standard HTTP request waits until the full response is assembled, then sends it all at once — this is catastrophic for LLM responses that stream tokens over 5-10 seconds. React Server Components with HTTP chunked transfer encoding solve exactly this problem: the server sends UI chunks to the browser incrementally, making LLM-powered interfaces feel alive and responsive.

1. The Core Problem: Why Streaming Matters Psychologically

Users have well-calibrated latency expectations from decades of web browsing. Even a 2-second delay feels unacceptable for most interactions — users assume something is broken and refresh or abandon. LLMs take 2-10 seconds to produce substantive responses. The solution isn't making LLMs faster (that's an infrastructure problem) — it's making the wait feel shorter through progressive disclosure.

❌ Without streaming

User submits → 0s: blank screen
→ 2s: still blank
→ 5s: user thinks it's broken
→ 7s: full response appears
Perceived as slow even if fast

✅ With streaming

User submits → 0ms: skeleton appears
→ 200ms: "Analyzing your request..."
→ 1s: text starts streaming in
→ 5s: chart appears
Same total time, feels instant

2. HTTP Chunked Transfer Encoding: The Technical Foundation

# Standard HTTP request (atomic — full response before sending):
POST /api/chat → Server processes everything → HTTP/1.1 200 OK + full body → Browser renders

# Chunked Transfer Encoding (streaming - sends pieces as ready):
POST /api/chat → Server begins work → HTTP/1.1 200 OK
                                       Transfer-Encoding: chunked
                                       
                                       → Chunk 1: "<React/> skeleton" (50ms)
                                       → Chunk 2: "Searching database..." (200ms)
                                       → Chunk 3: token by token text (1000ms)
                                       → Chunk 4: <StockChart /> (3000ms)
                                       → 0

 (final empty chunk = stream complete)

# In Next.js App Router, streaming happens automatically:
# - Server Components can use async/await — their output streams as resolved
# - Suspense boundaries define what streams together vs. independently
# - React's streaming renderer (renderToPipeableStream) handles the chunking

# Verification: Open Chrome DevTools → Network
# Look for the chat POST request → Response tab
# You'll see the RSC payload format (not readable JSON):
# 0:["$@1",["$","div",null,{"children":"Loading..."}]]
# 1:"$Sreact.suspense"
# (This is the React Flight format — correctly indicates streaming is working)

3. React Suspense: Declarative Streaming Boundaries

// app/chat/page.tsx (Server Component)
// Suspense boundaries define INDEPENDENT streaming regions
import { Suspense } from 'react';

export default function ChatPage({ params }: { params: { chatId: string } }) {
    return (
        <div className="chat-layout">
            {/* 1. Shell renders IMMEDIATELY (0ms) — not async */}
            <ChatHeader />
            <VoiceButton />  {/* Interactive immediately */}

            {/* 2. Past messages load from DB — streams independently */}
            {/* While loading: shows skeleton. Doesn't block input area. */}
            <Suspense fallback={<MessageListSkeleton count={5} />}>
                <PastMessagesList chatId={params.chatId} />
            </Suspense>

            {/* 3. Sidebar loads independently — doesn't block chat */}
            <Suspense fallback={<SidebarSkeleton />}>
                <ConversationHistory />
            </Suspense>

            {/* 4. Input is immediately interactive — ready before messages load */}
            <ChatInput />
        </div>
    );
}

// ANTI-PATTERN: Don't put everything in one Suspense
// This makes the ENTIRE page wait for the slowest component
function BadExample() {
    return (
        <Suspense fallback={<FullPageSpinner />}>
            <div>
                <ChatHeader />      {/* These don't need to wait... */}
                <ChatInput />       {/* ...but they're blocked by PastMessages */}
                <PastMessagesList />  {/* This is the slow one */}
            </div>
        </Suspense>
    );
}

// CORRECT: Granular Suspense = parallel streaming
function GoodExample() {
    return (
        <div>
            <ChatHeader />  {/* Instant */}
            <Suspense fallback={<Skeleton />}>
                <PastMessagesList />  {/* Only this part is blocked */}
            </Suspense>
            <ChatInput />  {/* Instant — user can type while messages load */}
        </div>
    );
}

4. createStreamableUI: Background Async Updates

// The most powerful streaming primitive in the Vercel AI SDK
// Allows updating UI from WITHIN background async work
// Even after the Server Action has already "returned"

'use server';
import { createStreamableUI } from 'ai/rsc';

export async function runStockAnalysis(symbol: string) {
    // 1. Create a mutable UI stream — start with loading state
    const ui = createStreamableUI(
        <div style={{ color: 'var(--text-secondary)', display: 'flex', alignItems: 'center', gap: '0.5rem' }}>
            <Spinner size="sm" />
            <span>Initializing analysis for {symbol}...</span>
        </div>
    );

    // 2. Run async work CONCURRENTLY (IIFE pattern)
    // The IIFE runs in the background — Server Action returns immediately below
    (async () => {
        try {
            // Step A: Fetch price data
            ui.update(
                <div>
                    <Spinner size="sm" /> Fetching 90 days of price data...
                </div>
            );
            const priceHistory = await fetchPriceHistory(symbol, '90d');

            // Step B: Fetch news
            ui.update(
                <div>
                    <Spinner size="sm" /> Searching for recent news ({symbol})...
                    <PricePreview data={priceHistory} />
                </div>
            );
            const recentNews = await fetchRelevantNews(symbol);

            // Step C: Run LLM analysis
            ui.update(
                <div>
                    <Spinner size="sm" /> Generating AI commentary...
                    <PriceChart data={priceHistory} />
                    <NewsList items={recentNews} />
                </div>
            );
            const analysis = await generateAnalysis(priceHistory, recentNews, symbol);

            // Step D: Finalize — replace entire UI with complete result
            ui.done(
                <div>
                    <PriceChart data={priceHistory} showAnnotations />
                    <AnalysisSummary text={analysis} />
                    <NewsList items={recentNews} expanded />
                </div>
            );
        } catch (error) {
            // Always handle errors to avoid hanging streams
            ui.done(
                <div style={{ color: '#f87171' }}>
                    Analysis failed: {error.message}. Please try again.
                </div>
            );
        }
    })();

    // 3. Return IMMEDIATELY — client receives the streamable value
    // Background IIFE continues updating the same UI node asynchronously
    return { id: Date.now(), display: ui.value };
}

// Key insight: ui.value is a special React node that "subscribes"
// to updates from the server via the RSC streaming connection.
// Client renders it like a normal component — updates appear automatically.

Frequently Asked Questions

Does streaming work on Vercel Edge Runtime or only Node.js?

The Vercel AI SDK's streaming primitives work on both runtimes, but with important limitations. Edge Runtime blocks most Node.js APIs (file system, native modules, some npm packages). If your streaming action uses Prisma (requires Node.js), heavy npm dependencies, or file system access, you must use the Node.js runtime: export const runtime = 'nodejs' in the route or page file. Edge Runtime is excellent for pure streaming text (LLM completions with no DB) — it has lower cold-start latency. Most production Generative UI apps with database persistence use Node.js runtime and set maxDuration = 60 to allow for long LLM operations.

Why does my streaming stop halfway through on Vercel free tier?

Vercel's free tier enforces a 10-second function execution timeout. LLM responses generating 500+ tokens can easily exceed this, especially with additional data fetching before the LLM call. Solutions: (1) Upgrade to the Pro plan (60-second limit on serverless, 300 seconds for streaming). (2) Use the AI SDK's streaming features — the SDK keeps the connection alive even if the function itself approaches timeout. (3) Restructure: fetch data first, then stream only the LLM call, minimizing total execution time. (4) Move to maxDuration = 300 for long-running analysis tasks. Always test with a stopwatch against your tier's actual limits.

Conclusion

HTTP chunked transfer encoding and React Suspense form the technical foundation of Generative UI. Streaming transforms 7-second blank-screen waits into engaging, progressive experiences where users see skeleton shapes immediately, status updates throughout, and content appearing piece by piece. The createStreamableUI primitive extends this further by allowing background async work to update the same UI node after the Server Action has returned — enabling the skeleton → intermediate state → final component animation pattern that makes AI interfaces feel alive. Granular Suspense boundaries (one per independently-loading section) ensure that slow components don't block fast ones, keeping the interface interactive even while complex AI operations run in parallel.

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK