- 09
- May
Datadog's State of AI Engineering 2026 report opens with a stark headline: AI is hitting an infrastructure ceiling. The standout numbers — roughly 5% of AI commands in production fail, and over 60% of those failures trace back to system capacity issues — not because models aren't smart enough, but because infrastructure can't keep up with demand. For Thai organizations planning to embed AI into their ERP, this is a clear signal: "Choosing an AI tool matters less than choosing an AI architecture."
Quick summary: What does Datadog's State of AI Engineering 2026 actually say?
- ~5% of AI commands fail in production — 1 in 20 user requests don't complete
- 60%+ of those failures stem from infrastructure capacity — not model bugs
- 69% of organizations run 3+ AI models simultaneously — complexity multiplies
- "Operational maturity gap" — adoption outpaces operational readiness
- Thailand adopts AI fast but operational readiness lags Singapore
- Datadog's takeaway: "AI winners won't be those with the best models — they'll be the ones who build the best operational controls around those models."
1. What Datadog Actually Found in State of AI 2026
Datadog is an observability platform that monitors customer systems worldwide (cloud, applications, AI services). The State of AI Engineering 2026 report draws on real production telemetry, not survey responses, which makes the numbers a faithful reflection of the market's actual infrastructure state.
| Area | Datadog Number | Meaning |
|---|---|---|
| Failure rate | ~5% of AI commands fail | 1 in 20 user requests hits an error or timeout |
| Primary failure cause | 60%+ from capacity issues | GPU/inference queues maxed out — not bugs |
| Multi-model adoption | 69% run 3+ models | Mixing OpenAI + Claude + open-source — not single-vendor |
| System complexity | Growing exponentially | Hard to monitor, harder to troubleshoot, vendor lock-in compounds |
| User impact | Slower responses, more errors, degraded UX | AI ROI evaporates as users abandon the feature |
5% failure sounds small — but if your ERP runs 10,000 AI requests per day, that's 500 failed requests per day in a system handling finance, accounting, and inventory decisions. That's a real operational risk that demands planning. See AI Adoption Gap for why Thai organizations adopt fast but struggle to scale.
2. Why Is AI Hitting the Infrastructure Ceiling? — Plain English
Here's the simple version. AI inference runs on GPUs (graphics processing units), which are extraordinarily expensive (an Nvidia H100 retails at USD 25,000-40,000 per card) and power-hungry. Building new GPU-rich data centers takes 1-2 years just for construction — plus the wait for hardware delivery.
But AI demand grows faster than that. The Stanford AI Index 2026 notes that training compute grows 4-5× per year, while infrastructure capacity expands only 2-3× per year. The result:
- Higher queue depth — your request waits in line for a free GPU
- Latency spikes — response times deteriorate during peak hours
- Vendor throttling — OpenAI/Anthropic may rate-limit even enterprise customers under heavy load
- Delayed model rollouts — new models ship but capacity isn't there to use them
This risk is concentrated in a handful of data centers — IEEE Spectrum reports most AI-grade data centers cluster in a few US locations. If one goes down, customers worldwide — Thailand included — are affected. See AI in ERP 2026 for more.
3. 5 Architecture Decisions ERP Must Make Before Adding AI
Picking AI for your ERP is not "just plug in ChatGPT and call the API." There are 5 architectural decisions that determine reliability, cost, and data sovereignty:
| Decision | Options | Impact |
|---|---|---|
| 1. Inference location | Cloud (US/SG) | On-premise GPU | Hybrid | Latency, monthly cost, data sovereignty |
| 2. Acceptable latency | Sync (<2 sec) | Async (15+ sec OK) | Different use cases — chat must be fast, summary reports can wait |
| 3. SLA / capacity guarantee | Pay-as-you-go | Reserved | Dedicated | Reserved costs more but guarantees throughput at peak |
| 4. Cost model | Per-token | Flat monthly | Per-seat | Per-token is volatile — hard to budget |
| 5. Fallback / graceful degradation | Cache | Smaller model | Disable feature | When AI lags or fails, the core ERP must keep running |
The question executives must ask their IT team — if OpenAI/Claude went down today, would our ERP still function? If the answer is "no," your architecture is too tightly coupled to one vendor. See Ollama Self-host Security for self-hosting options.
4. On-premise vs Cloud — Trade-offs for AI in ERP
This is the trade-off every organization must consider. There's no single right answer — it depends on size, data sensitivity, and budget:
| Factor | On-premise GPU | Cloud AI Service |
|---|---|---|
| Capital cost (upfront) | High — GPUs + server room | Low — pay-as-you-go |
| Operating cost (long-term) | Stable — power + maintenance | Variable — can blow your budget |
| Latency | Lowest — on the LAN | Depends on data center (US 200+ ms) |
| Data sovereignty | Data never leaves the org — fits PDPA | Data crosses borders — DPA required |
| Capacity guarantee | Bound by hardware — slow to scale | Elastic — but subject to throttling |
| Available models | Open-source (Llama, Qwen, DeepSeek) | Frontier (GPT, Claude, Gemini) |
| Best fit for | Sensitive accounting/HR + steady load | General summarization + variable load |
In practice, many large organizations choose a hybrid approach — cloud for general tasks, on-premise for PDPA-sensitive or confidential data. See AI Investment ROI for how to think about the spend.
5. Questions to Ask Your Cloud ERP Vendor — Before You Sign
If your cloud ERP vendor says "we have an AI Assistant," ask these 7 follow-up questions. Most vendors won't be able to answer all of them:
7 questions for any Cloud ERP vendor:
- Where does your AI inference run? — If "Azure US" — your data leaves Thailand on every call
- What's the SLA on the AI service? — Core ERP may be 99.9%, but the AI sub-component might be 99.0% (nearly 4 days/year)
- What's the fallback when AI is down? — If "none" — the feature isn't dependable
- How is AI billed? — Per-token billing explodes with heavy use
- Will our data train your models? — Demand a written guarantee
- If we cancel, are our data and embeddings fully deleted? — Critical for PDPA compliance
- Is an on-premise option available? — For organizations needing data sovereignty
If a vendor can't answer even one of these, they haven't thought architecture through deeply enough — and you risk becoming their "experimental customer." See AI Tools for Business for more.
6. Saeree ERP's Approach to AI Infrastructure
Saeree ERP is currently developing an AI Assistant for accounting, inventory, and HR (in training during 2026). Our approach is grounded in choosing architecture that fits Thai organizations:
| Area | Saeree ERP Approach |
|---|---|
| Deployment | Both on-premise + cloud — customers choose |
| AI architecture | ERP keeps working even if AI is down — AI is an add-on, not a critical path |
| Data sovereignty | On-premise → data never leaves the org, fits PDPA |
| Security | SSL A+, 2FA, role-based access — fit for sensitive data |
| Cost | Choose CAPEX (on-prem) or OPEX (cloud) — not forced into either |
| Vendor dependency | Open-source stack (PostgreSQL, Linux) — no single-vendor lock |
Our principle is simple — "AI inside ERP should be a tool that helps the team, not a critical dependency that brings the ERP down when the AI service falters."
7. What Thai Organizations Should Do Today
Before signing a cloud ERP contract that bakes AI features deeply in, do these 5 things:
- 1. Audit yourselves — list which workflows truly need AI (most don't)
- 2. Separate critical vs. nice-to-have — closing the books and issuing tax invoices = critical; executive summary reports = nice-to-have
- 3. Run a free trial — a good vendor offers 1-2 months before you commit
- 4. Measure real failure rates — log how often AI fails or lags during real use
- 5. Build fallbacks — every workflow needs a "manual mode" for when AI is unavailable
Summary
| Finding | ERP Lesson |
|---|---|
| 5% AI failure rate | Design ERP to keep running when AI fails |
| 60% from capacity | Choose architecture with explicit SLA / reserved capacity |
| 69% run 3+ models | Build an abstraction layer — avoid single-vendor coupling |
| Operational maturity gap | Invest in observability + monitoring from day 1 |
| Sensitive data | Consider on-premise for accounting/HR/PII workloads |
"Picking AI for your ERP isn't picking a tool — it's picking an architecture that will live with your organization for 5-10 years. If your cloud ERP vendor can't tell you where inference runs, what the SLA is, what the fallback looks like, or whether on-premise is even an option — you're about to become their experimental customer, not a customer who'll get a stable system."
References
- Dailynews — Datadog releases AI Engineering 2026 report (Thai)
- Datadog — State of AI Reports
- Stanford HAI — 2026 AI Index Report
- IEEE Spectrum — State of AI Index 2026 (Data Center Concentration)
How to Architect AI in ERP Without Over-Reliance on a Single Vendor
Saeree ERP supports both on-premise and cloud — designed so the ERP keeps working even when an AI service is unavailable. Get a consultation on the right architecture for your organization.
Free ConsultationCall 02-347-7730 | sale@grandlinux.com
