Building Production Apps with the Claude API (Step-by-Step)

When I first shifted from OpenAI's API to Anthropic's Claude API, I made a massive mistake: I assumed they functioned exactly the same way. I spent three days fighting with system prompts, scratching my head over why Claude was hallucinating, and struggling to get streaming to work cleanly.

The reality is that Anthropic's API is fundamentally different. It forces you to be a better engineer. While OpenAI is notoriously forgiving of sloppy prompting and badly formatted JSON, Claude is a precision instrument. If you feed it garbage, it won't try to guess your intent—it will fail. But if you structure your Messages API calls correctly, Claude 3.5 Sonnet will outperform GPT-4o on complex coding logic by a mile.

In this masterclass, I am going to walk you through exactly how I build production-grade, streaming, Next.js applications using the Anthropic Claude API. We aren't building a toy "hello world" chatbot. We are building a robust architecture that handles system prompts correctly, manages streaming cleanly, and uses XML tags the way Claude's training data expects.

Phase 1: Understanding the Messages API Protocol

The core differentiator between Claude and other LLMs is how you structure the prompt. Anthropic uses the Messages API, which strictly enforces a conversational structure. However, unlike OpenAI which allows you to put the system prompt into the messages array, Anthropic explicitly separates the system prompt into its own top-level parameter.

If you try to pass a 'system' role inside the messages array, the API will throw an error immediately. This forced separation is actually brilliant because it prevents system prompt injection (where a user's message overrides the system instructions).

The Standard vs. Streaming Call

Let's look at the basic Node.js SDK setup. First, install the SDK: npm install @anthropic-ai/sdk

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY, 
});

// A standard, blocking API call (NOT recommended for UX)
async function generatePost() {
  const msg = await anthropic.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    system: "You are an expert technical writer. Output your response wrapped in <markdown> tags.",
    messages: [
      { role: "user", content: "Write a short paragraph about React Server Components." }
    ],
  });
  
  console.log(msg.content[0].text);
}

While this works for background cron jobs, you should never use a blocking call like this for a user-facing application. Claude 3.5 Sonnet processing 4,000 output tokens might take 20 seconds. Your user will stare at a loading spinner, assume the app is broken, and leave. You must stream.

Phase 2: Building a Next.js Streaming Route

To build a modern AI wrapper, we need two pieces: the Next.js API Route (Backend), and the React View (Frontend). Let's build the API route first. We will use the Vercel AI SDK (ai package) mixed with the Anthropic provider. This abstracts away the nasty Server-Sent Events (SSE) parsing you used to have to do manually.

Run: npm install ai @ai-sdk/anthropic

The Backend: app/api/chat/route.js

This is where the magic happens. Notice how we use the streamText function. This creates a readable stream that pipes the generated tokens directly to the client as fast as Anthropic spits them out.

import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';

export const maxDuration = 60; // DO NOT FORGET THIS ON VERCEL!

export async function POST(req) {
  // Extract messages from the frontend request
  const { messages } = await req.json();

  // Call Claude with streaming enabled
  const result = await streamText({
    model: anthropic('claude-3-5-sonnet-20241022'),
    system: `You are a Senior Frontend Engineer. 
    Always wrap your code in standard markdown code blocks. 
    Think step-by-step inside <thinking> tags before writing your final solution.`,
    messages: messages,
    temperature: 0.2, // Lower temperature for coding tasks (more deterministic)
  });

  // Return a streaming response back to the client
  return result.toDataStreamResponse();
}

The 30-Second Vercel Timeout Death Trap

See that export const maxDuration = 60; line at the top? I once lost an entire weekend debugging why my Claude responses were truncating halfway through. By default, Vercel Serverless Functions timeout after 10 or 15 seconds (depending on your plan). Because Claude takes time to stream, the serverless function was dying mid-stream. Explicitly extending the duration is mandatory.

The Frontend: React Client Implementation

Now that our backend is established, we need the frontend. The Vercel AI SDK provides a legendary hook called useChat. It handles the local state holding the message array, the form submission, and appending the chunks to the UI automatically.

'use client';

import { useChat } from 'ai/react';
import { useRef, useEffect } from 'react';

export default function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat', // Points to the route we just built
  });

  // Auto-scroll to bottom
  const messagesEndRef = useRef(null);
  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  return (
    <div className="flex flex-col h-screen max-w-3xl mx-auto p-4">
      {/* Chat History View */}
      <div className="flex-1 overflow-y-auto space-y-6 mb-4">
        {messages.map((m) => (
          <div 
            key={m.id} 
            className={`flex ${m.role === 'user' ? 'justify-end' : 'justify-start'}`}
          >
            <div className={`rounded-lg p-4 max-w-[80%] ${
              m.role === 'user' ? 'bg-blue-600' : 'bg-gray-800'
            }`}>
              <strong className="block mb-2">{m.role === 'user' ? 'You' : 'Claude'}</strong>
              <div className="whitespace-pre-wrap font-mono text-sm leading-relaxed">
                {m.content}
              </div>
            </div>
          </div>
        ))}
        {isLoading && <div className="text-gray-500 animate-pulse">Claude is thinking...</div>}
        <div ref={messagesEndRef} />
      </div>

      {/* Input Form */}
      <form onSubmit={handleSubmit} className="relative">
        <textarea
          className="w-full p-4 pr-20 rounded-xl bg-gray-900 border border-gray-700 focus:ring-2 focus:ring-blue-500"
          rows={3}
          value={input}
          onChange={handleInputChange}
          placeholder="Ask Claude anything..."
        />
        <button 
          type="submit" 
          disabled={isLoading || !input.trim()}
          className="absolute right-3 bottom-3 bg-white text-black px-4 py-2 rounded-lg font-bold disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  );
}

When you hook this up, the text will stream in exactly like ChatGPT's official interface. The useChat hook automatically adds your input to the messages array, sends the whole array to the backend, and appends Claude's streamed delta chunks to the last message in the local array.

Phase 3: The Secret Sauce - XML Tag Conditioning

Now that you have the infrastructure working, we need to talk about Prompt Engineering for Claude. This is where 90% of developers fail. They use OpenAI prompt templates on Claude. This is incredibly suboptimal.

Anthropic trained Claude using massive amounts of XML data. If you want structural isolation—meaning you want Claude to separate its "scratchpad thoughts" from its "final answer"—you must use XML tags. Not markdown headers. Not JSON nested objects. XML.

The Double-Tag Implementation

When I build a backend agent using Claude, my system prompt always forces it to think before acting.

"Before providing your final code solution, you MUST outline your architectural decisions inside <scratchpad> tags. Consider the edge cases. Once you have finalized your approach, provide the production ready code inside <solution> tags."

On the backend API route, when Claude streams its response, I can use a Regex check to filter out the <scratchpad> before it gets to the user's screen. The user never sees the reasoning, they only see the instantaneous, highly-optimized <solution> code.

This technique essentially creates an "o1-style" reasoning model but using Claude 3.5 Sonnet, giving you immense speed and unprecedented coding accuracy. Note that Anthropic has now introduced 'Extended Thinking' mode naturally into Sonnet 3.7, but controlling it manually with XML tags still remains the best way to enforce exact formatting protocols.

Phase 4: Prefilling the Assistant Response

Here is a ninja trick completely unique to the Anthropic API. With OpenAI, an API request ends with the 'user' message. But Anthropic allows you to send an API request that ends with an 'assistant' message.

What does this mean? It means you can force Claude to start its sentence with specific text. Let's say you want Claude to generate a JSON object containing SEO metadata, but Claude keeps stubbornly adding conversational fluff like "Here is your JSON:" at the beginning.

// Force Claude to spit out raw JSON by prefilling the opening bracket
const msg = await anthropic.messages.create({
  model: "claude-3-5-sonnet-20241022",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Extract metadata from this article into JSON format." },
    // PREFILLING THE ASSISTANT:
    { role: "assistant", content: "{" } 
  ]
});

// The result will literally start the stream right AFTER the '{' character!
console.log(msg.content[0].text); // Outputs: "title": "My Article", ...

By prefilling the assistant message with a literal open bracket {, Claude's autocomplete nature forces it to immediately start outputting valid JSON keys. It mathematically cannot output conversational fluff because it believes it has already started a JSON block. This is the single most powerful technique for structured data extraction I have ever found.

Final Thoughts

Building applications with the Claude API requires a shift in mindset. You must respect the XML tagging protocol, you must manage your serverless streaming timeouts, and you must utilize tools like Prefilling to constrain outputs perfectly.

If you follow the architecture outlined in this article, you will bypass the massive "glue code" pipelines most developers build to sanitize LLM outputs. You will have a lightning-fast, highly deterministic application powered by arguably the best coding intelligence on the planet.