Token Costs vs Subscription Pricing
Dec 30, 2025 • 20 min read
The #1 way AI startups die: uncapped usage. If a user pays $20/month but generates $50 of OpenAI API calls, you lose $30 per customer. Scale that to 1,000 users and you're burning $30,000/month. This guide gives you the math, the pricing architectures, and the cost optimization techniques to build a sustainable AI business.
1. Understanding LLM Costs: The Math
Every major LLM provider charges per token. Here's how the math translates to real product costs:
| Model | Input $/1M tokens | Output $/1M tokens | Cost per 1k messages |
|---|---|---|---|
| GPT-4o | $5.00 | $15.00 | ~$12.50 |
| GPT-4o-mini | $0.15 | $0.60 | ~$0.38 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | ~$9.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 | ~$0.19 |
| Llama 3.1 (self-hosted) | $0.00 | $0.00 | GPU cost only |
Worked Example: The Fatal Mistake
You launch an AI writing assistant at $20/month. Your average power user:
- Sends 50 messages/day x 30 days = 1,500 messages/month
- Each message: 800 tokens input + 1,200 tokens output = 2,000 tokens
- Total: 3,000,000 tokens/month using GPT-4o
- Cost: 1.8M input x $5 + 1.2M output x $15 = $9 + $18 = $27/month
You're losing $7 per power user monthly. With 500 such users, that's $3,500/month in losses from your best customers — the ones you most want to retain.
2. The 5 Pricing Architectures
Architecture 1: Hard Message Limits (Simplest)
Cap the number of messages or actions per billing period. Simple to implement, easy to communicate to users. Best for early-stage products where risk management matters most:
- Free tier: 50 messages/month
- Pro $20/month: 1,000 messages/month
- Enterprise: Unlimited (with your own key)
Architecture 2: Credit System (Most Flexible)
Users buy credits. Different actions cost different credits based on compute. This lets you price by value delivered:
const CREDIT_COSTS = {
chat_message: 5, // Simple Q&A — cheap to serve
document_analysis: 25, // Long context — moderate cost
image_generation: 50, // Expensive multimodal
code_review: 30, // Medium complexity
};
// User buys 1,000 credits for $10 (~$0.01/credit)
// Simple chat = $0.05 per message (very fair)
// Document analysis = $0.25 per doc (reflects value)Architecture 3: Bring Your Own Key (BYOK)
Users provide their own OpenAI API key. You sell the UX and workflows, not the compute. Zero per-request cost to you. Best for developer tools:
const userApiKey = await getUserApiKey(userId);
if (!userApiKey) {
return { error: "Add your OpenAI API key in Settings to continue." };
}
const openai = new OpenAI({ apiKey: userApiKey }); // Their billingArchitecture 4: Per-Seat B2B
For enterprise tools, charge per seat with usage caps. Predictable for buyers, safe for you. Typical structure: Starter $50/seat (500 actions), Business $100/seat (2,000 actions), Enterprise: custom.
Architecture 5: Outcome-Based Pricing
Charge based on outcomes, not compute. An AI writing job descriptions? Charge per job post published. An AI generating sales emails? Charge per reply received. This aligns your revenue with customer success but is complex to implement.
3. Cost Optimization Techniques
Model Routing — The Biggest Win
Not every request needs GPT-4o. Route simple requests to cheap models:
function selectModel(taskType: string): string {
const hardTasks = ['code_review', 'reasoning', 'complex_analysis'];
if (hardTasks.includes(taskType)) return 'gpt-4o'; // $5/1M
return 'gpt-4o-mini'; // $0.15/1M — 33x cheaper!
}
// For most products, 70-80% of requests use the mini model.
// This alone reduces LLM costs by 60-75%.Prompt Caching
Large, static system prompts (instructions, personas, reference docs) can be cached. Anthropic charges 90% less for cached tokens. OpenAI charges 50% less. For apps with 5k+ token system prompts, this alone can halve your input costs.
Output Compression
Output tokens cost 3-5x more than input. Force conciseness in your system prompt: "Be concise. No filler phrases. No summary repetition." Use max_tokens to cap responses. Use JSON structured output — it's more compact than prose.
Streaming + Early Stop
With streaming enabled, users who get their answer after 200 tokens and close the tab save you the cost of 800 more tokens. You only pay for tokens actually streamed to the client.
4. Real-Time Cost Tracking
async function trackUsage(userId: string, model: string, usage: Usage) {
const cost = (usage.prompt_tokens / 1e6) * MODEL_COSTS[model].input
+ (usage.completion_tokens / 1e6) * MODEL_COSTS[model].output;
await db.usageLog.create({ data: { userId, cost, ...usage } });
const monthTotal = await getMonthlySpend(userId);
if (monthTotal >= USER_LIMIT) {
await blockUser(userId, 'quota_exceeded');
} else if (monthTotal >= USER_LIMIT * 0.8) {
await sendWarningEmail(userId, monthTotal, USER_LIMIT);
}
}5. Real-World Pricing Examples
ChatPDF-style App
Free: 3 documents, 20 questions. Pro $15/month: 50 documents, unlimited Q&A. Average cost at scale using GPT-4o-mini with cached document context: ~$3.50/user/month. Gross margin: ~77%.
AI Code Review Tool (B2B)
$29/developer/month, up to 200 code reviews. Average cost per review using GPT-4o: $0.12. Max cost at 200 reviews: $24. Margin is thin here — switch to GPT-4o-mini for diff analysis and GPT-4o only for complex reasoning to target 60%+ margins.
AI Content Generator (Consumer)
Free: 5 articles/month. Pro $12/month: unlimited (with 1,500-word cap per article). Cost ceiling using GPT-4o-mini: ~$2.00/user/month. Gross margin: 83%.
Frequently Asked Questions
Should I absorb costs or pass them to users?
Most consumer products absorb costs into subscription pricing — direct pass-through creates friction. The exception is developer tools where technical users expect BYOK. B2B products often use BYOK for enterprise tiers where customers have existing cloud commitments.
How do I prevent a single user from bankrupting me?
Implement hard quota enforcement at the API level (not client-side). Use a real-time counter in Redis or your database, checked before every LLM call. Never rely on honor systems or daily jobs to detect overages — by then, damage is done. For safety, also set rate limits (max 10 requests/minute) to prevent abuse bursts.
What's a healthy gross margin for AI SaaS?
Traditional SaaS targets 80%+ gross margins. AI SaaS realistically targets 50-75% at maturity. Early stage: prioritize margins above 40% minimum, then automate optimizations (model routing, caching) to improve over time. Below 30% and you have a unit economics problem that growth will only amplify.
Conclusion
Sustainable AI product pricing requires treating token costs as a core business metric from day one. Start with conservative usage limits, instrument your cost tracking before launch, and experiment with credit systems or BYOK to de-risk your unit economics. The businesses thriving in AI are those that found the right balance between pricing for the value they deliver and managing the compute costs of delivering it.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.