
Mintlify ran the numbers on their AI documentation assistant and found they were spending $70,000 a year. For reading static docs. That's not a training cost or an infrastructure cost — it's pure inference burn on a feature that answers questions from content that doesn't change. Most companies never even calculate this number.
The Problem: Nobody Measures Per-Conversation Cost
Here's the pattern we see across the industry. A team prototypes an AI feature — chatbot, support agent, internal assistant. It works. Users like it. They ship it. Nobody asks: **what does each conversation actually cost us?**
The cost stack looks deceptively simple at small scale:
- **API calls:** GPT-4, Claude, Gemini — $0.01-0.06 per conversation depending on context length
- **Embedding/RAG pipeline:** $0.002-0.01 per query for retrieval
- **Reranking and filtering:** another $0.001-0.005 per query
- **Infrastructure:** vector databases, caching layers, monitoring
At 1,000 conversations a month, this feels free. At 10,000, it's a line item. At 850,000 — Mintlify's volume — it's $70K a year on content that **never changes**.
The problem isn't that AI is expensive. The problem is that teams build the most sophisticated retrieval architecture possible, then scale it without checking if the sophistication is earning its keep.
A new paper on [Batched Contextual Reinforcement (BCR)](https://arxiv.org/abs/2604.02322) shows the math clearly: most reasoning pipelines waste 15.8% to 62.6% of their tokens. That's not a rounding error. At scale, it's a salary.
The Solution: AI Cost Audit Before You Scale
The fix isn't to stop building AI features. It's to **audit the cost structure before you're locked in**. Here's the framework:
**Step 1: Calculate true per-conversation cost.** Not just API calls. Include embedding, retrieval, reranking, infrastructure amortization, and engineering maintenance. Most teams only count the LLM API bill and miss 40-60% of the real cost.
**Step 2: Classify your content.** Is your AI reading structured content (docs, code, knowledge bases) or unstructured content (chat logs, emails, transcripts)? Mintlify's revelation: for structured content, vector search is pure overhead. A virtual filesystem — `cat`, `grep`, `ls` — gets better results at near-zero marginal cost.
**Step 3: Apply the BCR principle.** The BCR research proves you can cut reasoning tokens dramatically without losing accuracy. The "accuracy-efficiency trade-off" is mostly a myth — teams are paying for tokens that don't help. Smart batching, context pruning, and task-adaptive budgeting reduce costs 15-62% with maintained performance.
**Step 4: Set a cost ceiling per conversation.** Pick a number. $0.05? $0.01? $0.001? Whatever makes your unit economics work. Then engineer backwards from that constraint. Most teams engineer forwards — build the best system, then hope the cost is acceptable.

Benchmarks: What Optimization Actually Saves
Let's look at real numbers from recent research and industry cases:
**Mintlify's filesystem approach:** - Eliminated embedding compute entirely for structured docs - Marginal cost per query dropped to near zero - Accuracy improved (no embedding noise, no chunking artifacts) - 850K monthly conversations at effectively $0 marginal
**BCR token reduction:** - 15.8% to 62.6% reduction across 5 math benchmarks - No accuracy loss — in some cases, accuracy improved - Applies to reasoning chains, multi-step tasks, and long-context queries
**Common optimization patterns that actually work:** - **Cache aggressively:** if users ask the same questions, serve cached responses (saves 30-70% on repetitive queries) - **Right-size models:** don't use GPT-4 for classification tasks a 3B model handles fine (saves 80-95% per call) - **Prune context:** most systems send way more context than needed (saves 20-40% per call) - **Batch where possible:** BCR shows batching context across related queries saves 15-60% of tokens
**The honest caveat:** These optimizations aren't free. They require engineering time to implement. But the ROI is typically 10-50x within the first quarter. Mintlify's ChomaFs took weeks to build. It saves $70K/year. That's a no-brainer.
The Business Impact: The Math You're Not Doing
Let's put this in terms that matter — dollars and headcount.
**Scenario 1: Small SaaS (50K monthly AI conversations)** - Typical cost: $8-15K/year in API + infrastructure - Optimized cost: $2-5K/year - Annual savings: $6-10K - That's not life-changing, but it's a junior dev's salary in many markets
**Scenario 2: Mid-size platform (500K monthly conversations)** - Typical cost: $50-80K/year - Optimized cost: $10-25K/year - Annual savings: $40-55K - Now we're talking real engineering budget
**Scenario 3: Enterprise scale (5M+ monthly conversations)** - Typical cost: $500K-1M/year - Optimized cost: $100-300K/year - Annual savings: $400-700K - That's a team. That's a product line. That's competitive advantage.
The companies that do this math early build sustainable AI features. The companies that don't wake up in 18 months wondering why their AI margins are negative.
**The framework is one question:** *What does each conversation cost us, all-in, and is every dollar of that cost earning its keep?* If you can't answer that in 30 seconds, you have a problem.
The Takeaway
The AI gold rush created a culture of "build first, measure never." Teams ship features with embedded vector databases, multi-model pipelines, and sophisticated retrieval architectures — then never check if a simpler approach would work 95% as well for 10% of the cost.
$70K/year to read static docs is not an edge case. It's what happens when nobody runs the numbers. **The best AI feature is one you can afford to run forever — not one that looks impressive in a demo and bleeds money in production.**
Audit your costs. Optimize before you scale. And remember: the most expensive token is the one that doesn't help.