The Observability Cost Trap: Why Datadog Bills Spike When You Add LLM Workloads
AI workloads generate 10–50× more telemetry per dollar of business value. Your Datadog contract didn't model that. Here's what to do.

If you added LLM-powered features to your product in the last 18 months and your observability bill spiked 40–200%, you’re not imagining it. The same two facts converging:
- AI workloads generate dramatically more telemetry per business transaction than the workloads observability tools were originally priced for.
- Datadog’s pricing model is per-unit (custom metrics, log GB, traces). The cost scales linearly with telemetry volume.
The combination produces bill increases that look catastrophic until you understand the mechanics. They’re not the vendor being predatory. They’re the natural outcome of a pricing model designed in 2017 meeting a workload designed in 2025.
This post is the diagnostic and the fix.
Why LLM workloads generate so much more telemetry
A pre-AI request flow has roughly this telemetry footprint:
- 1 HTTP trace (entry + a few spans)
- 5–15 application metrics emitted
- A handful of log lines
Total: maybe 20–50 telemetry items per request.
An LLM-powered agent request that does roughly the same work:
- 1 outer HTTP trace
- 3–8 LLM calls, each with prompt-version, model, latency, token count, tool calls
- 5–20 tool invocations (database lookups, API calls, retrieval calls)
- A vector-search span if RAG is involved
- An evaluation telemetry blob (which prompt won, what the score was)
- Full input/output for replay
- Application metrics around each step
Total: 200–800 telemetry items per request. Sometimes more.
For the same end-user transaction, you’re now generating 10–50× more telemetry. If your observability vendor prices on telemetry volume, your bill multiplies in proportion.
The line items that spike
Three Datadog SKUs are typically responsible for >80% of the LLM-induced spike:
1. Custom metrics. Each LLM call has 10–20 dimensions worth tracking — model, prompt version, tenant, latency, token count input, token count output, cost, tool count, eval score. The cardinality compounds badly. We’ve seen single tasks add $8K–$25K/month in custom-metric charges.
2. Logs ingestion. Per-request input/output blobs for replay can be 5–50 KB each. At 10,000 requests/day that’s 50–500 MB/day, ~1.5–15 GB/month per task. Add log search retention pricing and you’re looking at meaningful new line items.
3. APM hosts + custom traces. The trace count per request goes up. Even on a host-based plan, the trace volume can push you into higher tiers.
Combined, these typically add $15K–$80K/month for a moderately AI-active product. That’s $180K–$960K/year of new observability spend, on top of whatever you were already paying.
The architectural choices that drive the bill up
Three patterns we see in client AI stacks that compound the spike:
1. Instrumenting every prompt as a metric. A team adds 12 prompt variants for an A/B test, each emits 8 custom metrics, the eval harness emits another 6. They’ve added 168 custom metrics. The Datadog admin report shows them on the cost dashboard a month later.
2. Logging full LLM inputs and outputs to the central log store. Useful for debugging. Catastrophic for cost when each entry is 20 KB and you do this for every call.
3. Long retention defaults. If your central observability stack defaults to 30-day retention on everything, you’re paying retention costs on the high-volume AI logs for a month, when you only need rollback-grade fidelity for ~7 days.
The fix, ranked by impact
The three changes that recover the most spend:
1. Move AI workload logs to object storage with intelligent indexing. Tools like OpenObserve and ClickHouse-based log stacks use S3 as the primary store with on-demand indexing. The hot-index economics that drive Datadog log pricing don’t apply. At 2 TB/month of LLM log ingestion, the difference is typically $300K+/year.
2. Move custom AI metrics to a Prometheus-style backend. Per-cardinality pricing was designed for a cardinality model that AI workloads break. Prometheus (or any non-vendor metrics backend) doesn’t have the structural cost. Instrumentation code stays roughly the same; the backend swap is contained.
3. Split telemetry by retention tier. Short retention (7 days, hot) for operational debugging. Medium retention (30 days, warm) for trend analysis. Long retention (1 year+, cold) for compliance and rare audit. Your current stack probably has everything in the same tier. Most teams reclaim 50–70% of log storage cost by moving the bottom two tiers.
The build for these three is real engineering — 8–14 weeks for the full migration — but the math is unambiguous at any meaningful AI workload volume.
When the rebuild is overkill
Two situations where the spike is real but the rebuild isn’t justified:
1. The AI feature is in pilot. If you’re still validating the use case and might kill it, instrument minimally for now (essential traces only), accept the cost, and revisit when the feature graduates.
2. The AI feature is a small slice of overall observability. If your AI workload is <15% of telemetry volume and you’re at <$15K/month total Datadog spend, the rebuild engineering cost dominates the reclaim.
The rebuild becomes the right answer when the AI workload is structural (will grow) and the bill is already material ($25K+/month). At that point the math runs cleanly.
The cross-cutting recommendation
Whether you rebuild observability or not, this principle holds for any AI workload:
Per-task instrumentation, not per-request. Track cost per business outcome (lead-captured, ticket-resolved, contract-reviewed), not just per request. The economics of AI features are non-linear by task. Per-request metrics hide the workflow-level math that lets you make sane optimisation decisions.
This is the same architectural element from our AI cost per query post. The Datadog rebuild and the LLM provider cost optimisation are the same architectural problem viewed from different sides.
What we ship
Fixed-price engagements that diagnose the observability spike, separate the negotiable spend from the structural spend, and either rebuild around the structural part or fix the architecture to prevent further escalation. The Datadog cost calculator linked below estimates the spike contribution from your AI workload specifically and projects what a rebuild would reclaim.
If your bill went up >30% in the last six months: the rebuild conversation might be worth starting before the next renewal cycle.
Read more: /upstream/datadog-alternative · /agents/ · /calculators/datadog-cost
Run the matching free calculator
Each one runs in 3 minutes and emails you an 8-page memo.