02-347-7730  |  Saeree ERP - Complete ERP System for Thai Businesses Contact Us

Datadog: AI Is Hitting Infrastructure Limits — ERP Cloud Strategy Must Plan AI Infra First

Datadog AI Capacity Limits — ERP Cloud Infrastructure 2026
  • 09
  • May

Datadog's State of AI Engineering 2026 report opens with a stark headline: AI is hitting an infrastructure ceiling. The standout numbers — roughly 5% of AI commands in production fail, and over 60% of those failures trace back to system capacity issues — not because models aren't smart enough, but because infrastructure can't keep up with demand. For Thai organizations planning to embed AI into their ERP, this is a clear signal: "Choosing an AI tool matters less than choosing an AI architecture."

Quick summary: What does Datadog's State of AI Engineering 2026 actually say?

  • ~5% of AI commands fail in production — 1 in 20 user requests don't complete
  • 60%+ of those failures stem from infrastructure capacity — not model bugs
  • 69% of organizations run 3+ AI models simultaneously — complexity multiplies
  • "Operational maturity gap" — adoption outpaces operational readiness
  • Thailand adopts AI fast but operational readiness lags Singapore
  • Datadog's takeaway: "AI winners won't be those with the best models — they'll be the ones who build the best operational controls around those models."

1. What Datadog Actually Found in State of AI 2026

Datadog is an observability platform that monitors customer systems worldwide (cloud, applications, AI services). The State of AI Engineering 2026 report draws on real production telemetry, not survey responses, which makes the numbers a faithful reflection of the market's actual infrastructure state.

Area Datadog Number Meaning
Failure rate~5% of AI commands fail1 in 20 user requests hits an error or timeout
Primary failure cause60%+ from capacity issuesGPU/inference queues maxed out — not bugs
Multi-model adoption69% run 3+ modelsMixing OpenAI + Claude + open-source — not single-vendor
System complexityGrowing exponentiallyHard to monitor, harder to troubleshoot, vendor lock-in compounds
User impactSlower responses, more errors, degraded UXAI ROI evaporates as users abandon the feature

5% failure sounds small — but if your ERP runs 10,000 AI requests per day, that's 500 failed requests per day in a system handling finance, accounting, and inventory decisions. That's a real operational risk that demands planning. See AI Adoption Gap for why Thai organizations adopt fast but struggle to scale.

2. Why Is AI Hitting the Infrastructure Ceiling? — Plain English

Here's the simple version. AI inference runs on GPUs (graphics processing units), which are extraordinarily expensive (an Nvidia H100 retails at USD 25,000-40,000 per card) and power-hungry. Building new GPU-rich data centers takes 1-2 years just for construction — plus the wait for hardware delivery.

But AI demand grows faster than that. The Stanford AI Index 2026 notes that training compute grows 4-5× per year, while infrastructure capacity expands only 2-3× per year. The result:

  • Higher queue depth — your request waits in line for a free GPU
  • Latency spikes — response times deteriorate during peak hours
  • Vendor throttling — OpenAI/Anthropic may rate-limit even enterprise customers under heavy load
  • Delayed model rollouts — new models ship but capacity isn't there to use them

This risk is concentrated in a handful of data centers — IEEE Spectrum reports most AI-grade data centers cluster in a few US locations. If one goes down, customers worldwide — Thailand included — are affected. See AI in ERP 2026 for more.

3. 5 Architecture Decisions ERP Must Make Before Adding AI

Picking AI for your ERP is not "just plug in ChatGPT and call the API." There are 5 architectural decisions that determine reliability, cost, and data sovereignty:

Decision Options Impact
1. Inference locationCloud (US/SG) | On-premise GPU | HybridLatency, monthly cost, data sovereignty
2. Acceptable latencySync (<2 sec) | Async (15+ sec OK)Different use cases — chat must be fast, summary reports can wait
3. SLA / capacity guaranteePay-as-you-go | Reserved | DedicatedReserved costs more but guarantees throughput at peak
4. Cost modelPer-token | Flat monthly | Per-seatPer-token is volatile — hard to budget
5. Fallback / graceful degradationCache | Smaller model | Disable featureWhen AI lags or fails, the core ERP must keep running

The question executives must ask their IT team — if OpenAI/Claude went down today, would our ERP still function? If the answer is "no," your architecture is too tightly coupled to one vendor. See Ollama Self-host Security for self-hosting options.

4. On-premise vs Cloud — Trade-offs for AI in ERP

This is the trade-off every organization must consider. There's no single right answer — it depends on size, data sensitivity, and budget:

Factor On-premise GPU Cloud AI Service
Capital cost (upfront)High — GPUs + server roomLow — pay-as-you-go
Operating cost (long-term)Stable — power + maintenanceVariable — can blow your budget
LatencyLowest — on the LANDepends on data center (US 200+ ms)
Data sovereigntyData never leaves the org — fits PDPAData crosses borders — DPA required
Capacity guaranteeBound by hardware — slow to scaleElastic — but subject to throttling
Available modelsOpen-source (Llama, Qwen, DeepSeek)Frontier (GPT, Claude, Gemini)
Best fit forSensitive accounting/HR + steady loadGeneral summarization + variable load

In practice, many large organizations choose a hybrid approach — cloud for general tasks, on-premise for PDPA-sensitive or confidential data. See AI Investment ROI for how to think about the spend.

5. Questions to Ask Your Cloud ERP Vendor — Before You Sign

If your cloud ERP vendor says "we have an AI Assistant," ask these 7 follow-up questions. Most vendors won't be able to answer all of them:

7 questions for any Cloud ERP vendor:

  1. Where does your AI inference run? — If "Azure US" — your data leaves Thailand on every call
  2. What's the SLA on the AI service? — Core ERP may be 99.9%, but the AI sub-component might be 99.0% (nearly 4 days/year)
  3. What's the fallback when AI is down? — If "none" — the feature isn't dependable
  4. How is AI billed? — Per-token billing explodes with heavy use
  5. Will our data train your models? — Demand a written guarantee
  6. If we cancel, are our data and embeddings fully deleted? — Critical for PDPA compliance
  7. Is an on-premise option available? — For organizations needing data sovereignty

If a vendor can't answer even one of these, they haven't thought architecture through deeply enough — and you risk becoming their "experimental customer." See AI Tools for Business for more.

6. Saeree ERP's Approach to AI Infrastructure

Saeree ERP is currently developing an AI Assistant for accounting, inventory, and HR (in training during 2026). Our approach is grounded in choosing architecture that fits Thai organizations:

Area Saeree ERP Approach
DeploymentBoth on-premise + cloud — customers choose
AI architectureERP keeps working even if AI is down — AI is an add-on, not a critical path
Data sovereigntyOn-premise → data never leaves the org, fits PDPA
SecuritySSL A+, 2FA, role-based access — fit for sensitive data
CostChoose CAPEX (on-prem) or OPEX (cloud) — not forced into either
Vendor dependencyOpen-source stack (PostgreSQL, Linux) — no single-vendor lock

Our principle is simple — "AI inside ERP should be a tool that helps the team, not a critical dependency that brings the ERP down when the AI service falters."

7. What Thai Organizations Should Do Today

Before signing a cloud ERP contract that bakes AI features deeply in, do these 5 things:

  • 1. Audit yourselves — list which workflows truly need AI (most don't)
  • 2. Separate critical vs. nice-to-have — closing the books and issuing tax invoices = critical; executive summary reports = nice-to-have
  • 3. Run a free trial — a good vendor offers 1-2 months before you commit
  • 4. Measure real failure rates — log how often AI fails or lags during real use
  • 5. Build fallbacks — every workflow needs a "manual mode" for when AI is unavailable

Summary

Finding ERP Lesson
5% AI failure rateDesign ERP to keep running when AI fails
60% from capacityChoose architecture with explicit SLA / reserved capacity
69% run 3+ modelsBuild an abstraction layer — avoid single-vendor coupling
Operational maturity gapInvest in observability + monitoring from day 1
Sensitive dataConsider on-premise for accounting/HR/PII workloads

"Picking AI for your ERP isn't picking a tool — it's picking an architecture that will live with your organization for 5-10 years. If your cloud ERP vendor can't tell you where inference runs, what the SLA is, what the fallback looks like, or whether on-premise is even an option — you're about to become their experimental customer, not a customer who'll get a stable system."

References

How to Architect AI in ERP Without Over-Reliance on a Single Vendor

Saeree ERP supports both on-premise and cloud — designed so the ERP keeps working even when an AI service is unavailable. Get a consultation on the right architecture for your organization.

Free Consultation

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.