AI API cost calculator · Token-level

What you'll actually pay for inference.

Token volume, model choice, latency requirements, and wrapper margin (Harvey, Glean, Hebbia) all change the bill. Fourteen probing questions. Output: side-by-side $/mo across OpenAI, Claude, Gemini, and self-hosted, plus the wrapper margin if you're going through one. The full memo includes a 12-month cost projection, model-mix recommendation, and the questions to negotiate with on your next renewal.

FORMAT: 14 questions · ~3 min ACCESS: No signup to see result OUTPUT: full memo · email-gated

How this is calculated

Per-token published pricing across the four major providers is the input table. We adjust for: reasoning depth (basic chat vs Claude Opus / GPT-5 / Gemini Pro tier), context window (long-context surcharges where they apply), multi-modal (vision / audio token weights), streaming overhead, tool-calling round-trips, and wrapper margin if you're going through Harvey / Glean / Hebbia / Cresta / Sierra.

Self-hosted estimates assume Llama-class open weights on AWS Inferentia 2 / Trainium with batched inference — typical effective $/MTok ranges from $0.30 (small models, high throughput) to $4 (70B+, low throughput).

What this does NOT estimate

Fine-tuning costs (training compute amortised separately)
Embedding storage + retrieval (vector-DB infra)
Custom moderation / safety-classifier overhead
Engineering time to build + maintain prompt pipelines

For a procurement-ready model-mix recommendation, see the email memo or book a call.

FAQ

How accurate is this estimate?

±25% on the headline figure. Real bills vary based on prompt patterns, batch sizes, and provider rate-limit step functions. Treat as a planning anchor.

Is the wrapper margin really 30-60%?

For Harvey AI we have public reference data showing 50%+ margin on AmLaw 200 deployments. Glean and Hebbia run similar. The memo includes the source data + how to verify against your own contract.

When does self-host pay back?

Above ~$80K/yr OpenAI/Claude bills, Inferentia 2 + Llama 3.1 70B starts beating direct API on $/MTok. Below that, direct API wins on operational simplicity.

What's in the memo?

12-month cost projection (3 scenarios), model-mix recommendation, prompt-pattern optimisations that cut spend 20-40%, vendor-negotiation questions for renewal.

More calculators

AI agent ROI

When does an agent pay back?

Build vs Buy

3-year TCO + payback

SaaS spend audit

Multi-tool stack rationalisation

All calculators →

10 free tools