- 3
- April
OpenClaw Deep Dive Series EP.5 — Following EP.4 where we built Multi-Agent Workflows with multiple AI agents collaborating as a team, it's now time to deploy OpenClaw to a real production environment! Running an Agent on a dev machine is very different from deploying it for multiple concurrent users — you need to think about Security, Monitoring, and Scaling to handle real-world conditions, just like self-hosting Ollama where you must prioritize security from day one.
Quick Summary — What You'll Learn in This Article:
- Docker / Container Deployment — Containerize OpenClaw with Docker + docker-compose
- Reverse Proxy + TLS — nginx + SSL Certificate to prevent eavesdropping
- Authentication & Authorization — API Key, JWT, OAuth2/SSO + RBAC
- Logging & Monitoring — Structured Logging + Prometheus + Grafana
- Scaling Strategies — Vertical, Horizontal, Queue-based
- Backup & Recovery — Backing up Agent configs, conversation history
Production vs Development — What's the Difference?
Many people run OpenClaw on localhost and think "it works fine" — but as soon as real users start accessing it, problems appear immediately: no authentication means anyone can access it, no monitoring means you won't know the system is down, no backup means data loss is permanent.
| Aspect | Development | Production |
|---|---|---|
| Network | localhost:3000, no TLS | Domain + Reverse Proxy + TLS (HTTPS) |
| Authentication | None — anyone can access | API Key / JWT / OAuth2 + RBAC |
| Users | Single user (developer) | Multiple concurrent users + Load Balancing |
| Monitoring | Reading console.log | Prometheus + Grafana + Alerts |
| Backup | None | Automated Backup + Disaster Recovery Plan |
| Deployment | npm start / node index.js | Docker Container + Orchestration |
Docker Deployment — Containerize OpenClaw
The first step in deployment is Containerization — running OpenClaw inside a Docker Container ensures a consistent environment everywhere, whether it's dev, staging, or production.
Dockerfile
# Dockerfile for OpenClaw Production
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
# Multi-stage build — reduce image size
FROM node:20-alpine
WORKDIR /app
# Security: don't run as root
RUN addgroup -S openclaw && adduser -S openclaw -G openclaw
COPY --from=builder /app /app
RUN chown -R openclaw:openclaw /app
USER openclaw
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "index.js"]
docker-compose.yml
# docker-compose.yml — OpenClaw + nginx + Redis
version: '3.8'
services:
openclaw:
build: .
restart: unless-stopped
env_file: .env
networks:
- openclaw-net
depends_on:
- redis
healthcheck:
test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
nginx:
image: nginx:alpine
restart: unless-stopped
ports:
- "443:443"
- "80:80"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./certs:/etc/nginx/certs:ro
depends_on:
- openclaw
networks:
- openclaw-net
redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis-data:/data
networks:
- openclaw-net
networks:
openclaw-net:
driver: bridge
volumes:
redis-data:
Environment Variables (.env)
| Variable | Example Value | Description |
|---|---|---|
| NODE_ENV | production | Production mode (disables debug, enables optimization) |
| LLM_API_KEY | sk-xxxx... | API Key for LLM connection (OpenAI, Anthropic, etc.) |
| REDIS_PASSWORD | strong-random-password | Password for Redis (session + queue) |
| API_SECRET | jwt-secret-key | Secret for signing JWT tokens |
| LOG_LEVEL | info | Log level (debug, info, warn, error) |
Reverse Proxy + TLS
Never expose OpenClaw directly to the internet — always use a Reverse Proxy to manage TLS, rate limiting, and access control, just like configuring a self-hosted Ollama that requires nginx in front.
# nginx.conf — Reverse Proxy + TLS + WebSocket
server {
listen 80;
server_name openclaw.example.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name openclaw.example.com;
# TLS — Let's Encrypt
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Security Headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Strict-Transport-Security "max-age=31536000" always;
# Rate Limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
location / {
limit_req zone=api burst=20 nodelay;
proxy_pass http://openclaw:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# WebSocket for streaming responses
location /ws {
proxy_pass http://openclaw:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400;
}
}
Alternatively, use Caddy for automatic TLS management — just 5 lines:
# Caddyfile — Auto TLS
openclaw.example.com {
reverse_proxy openclaw:3000
encode gzip
}
Authentication & Authorization
An OpenClaw instance open to anyone is dangerous! You must have an identity verification system in place — choose the security level appropriate for your organization:
| Level | Method | Best For | Advantage |
|---|---|---|---|
| Basic | API Key in Header | Internal tools, small teams | Simple, quick to set up |
| Standard | JWT Token (per-user) | Medium teams, need user separation | User separation, audit log capability |
| Enterprise | OAuth2 / SSO (AD, LDAP) | Large organizations with Identity Provider | Centralized management, highest security |
Middleware: API Key Validation
// middleware/auth.js — API Key Validation
const VALID_API_KEYS = new Set(
process.env.API_KEYS?.split(',') || []
);
function authenticateApiKey(req, res, next) {
const apiKey = req.headers['x-api-key'];
if (!apiKey) {
return res.status(401).json({
error: 'Missing API Key',
message: 'Send API Key via header: X-API-Key'
});
}
if (!VALID_API_KEYS.has(apiKey)) {
console.warn(`[Auth] Invalid API Key attempt from ${req.ip}`);
return res.status(403).json({
error: 'Invalid API Key'
});
}
// Identify user from API Key (for audit logging)
req.user = { apiKey: apiKey.substring(0, 8) + '...' };
next();
}
module.exports = { authenticateApiKey };
RBAC — Role-Based Access Control
| Role | Permissions | Example Users |
|---|---|---|
| Admin | Full Access — manage Agents, Skills, Modules, Users | DevOps, System Administrators |
| User | Chat + use authorized Agents (cannot modify config) | General employees |
| ReadOnly | View logs + dashboard only (cannot chat) | Executives, Auditors |
Using Digital Signatures alongside API Keys provides additional protection against request forgery, especially for organizations requiring high-level security.
Logging & Monitoring
If you don't monitor, you won't know there's a problem — Production must have proper logging from day one for effective risk management.
What to Log
- User messages — who sent what and when (anonymize if PII is present)
- Agent actions — which Agent did what and how long it took
- Tool calls — which Skill/Module was called and what was the result
- Errors — every error with full stack trace
- Response times — how long each request took
Structured Logging (JSON)
// logger.js — Structured JSON Logging
const winston = require('winston');
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json()
),
defaultMeta: { service: 'openclaw-production' },
transports: [
new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
new winston.transports.File({ filename: 'logs/combined.log' })
]
});
// Example log entry:
// {
// "timestamp": "2026-04-03T09:15:30.123Z",
// "level": "info",
// "service": "openclaw-production",
// "message": "Agent completed task",
// "agent": "research-agent",
// "userId": "user-abc",
// "duration_ms": 2340,
// "tokens_used": 1520,
// "status": "success"
// }
Key Metrics to Monitor
| Metric | Target | Alert When |
|---|---|---|
| Requests/sec | < 100 req/s | > 80 req/s (approaching capacity) |
| Response Time (p95) | < 5 seconds | > 10 seconds (too slow) |
| Error Rate | < 1% | > 5% (serious issue) |
| Token Usage/day | Within set budget | > 80% of daily budget |
| Cost per Request | Within baseline | Increases > 50% from baseline |
We recommend using Prometheus for metric collection + Grafana for dashboards + alert rules that notify via Slack/Email when values are abnormal.
Scaling Strategies — Handling More Users
As users increase, a single OpenClaw instance may not be enough — there are 3 main strategies for scaling:
| Strategy | Method | Best For | Limitation |
|---|---|---|---|
| Vertical Scaling | Add more CPU / RAM to existing machine | Starting out, simplest approach, no code changes | Has a ceiling, cannot scale indefinitely |
| Horizontal Scaling | Add more instances + Load Balancer | Many users, need high availability | Requires session sharing management (Redis) |
| Queue-based | Redis Queue for async processing | Long-running AI tasks, non-real-time | Not suitable for real-time chat |
Architecture: Horizontal Scaling
# Architecture — Horizontal Scaling with Load Balancer
User Request
|
v
[nginx Load Balancer] (Round Robin / Least Connections)
|
|--> [OpenClaw Instance 1] -->\
|--> [OpenClaw Instance 2] --> [Shared Redis] --> [LLM API]
|--> [OpenClaw Instance 3] -->/
|
v
[Prometheus + Grafana] (Monitoring)
[Log Aggregator] (ELK / Loki)
All instances share Redis for sessions, queues, and caching — this allows users to interact with any instance without losing data.
Backup & Recovery
AI Agent data is more important than you might think — Agent configs, Skill definitions, and conversation history are difficult to recreate once lost. You need a clear Disaster Recovery plan.
What to Backup
- Agent configs — YAML/JSON files defining agents, prompts, models
- Skills & Modules — custom code you've written
- Conversation history — chat history (if stored)
- .env file — environment variables (store separately, never in git)
- Redis data — sessions, cache, queue
Backup Script (Cron Job)
#!/bin/bash
# backup-openclaw.sh — runs daily at 2 AM via cron
# crontab: 0 2 * * * /opt/openclaw/backup-openclaw.sh
BACKUP_DIR="/backup/openclaw/$(date +%Y-%m-%d)"
OPENCLAW_DIR="/opt/openclaw"
mkdir -p "$BACKUP_DIR"
# 1. Backup Agent configs + Skills
cp -r "$OPENCLAW_DIR/agents" "$BACKUP_DIR/"
cp -r "$OPENCLAW_DIR/skills" "$BACKUP_DIR/"
cp -r "$OPENCLAW_DIR/modules" "$BACKUP_DIR/"
# 2. Backup conversation history (Redis RDB)
docker exec openclaw-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
sleep 5
docker cp openclaw-redis:/data/dump.rdb "$BACKUP_DIR/redis-dump.rdb"
# 3. Backup .env (encrypted)
gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/env.gpg" "$OPENCLAW_DIR/.env"
# 4. Compress
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# 5. Delete backups older than 30 days
find /backup/openclaw/ -name "*.tar.gz" -mtime +30 -delete
echo "[Backup] Completed: $BACKUP_DIR.tar.gz"
Critical Production Mistakes to Avoid:
- Never expose OpenClaw directly to the internet — always go through a Reverse Proxy, which provides better protection against attacks.
- Never store API Keys in code or commit them to git — use .env files or a Secret Manager (Vault, AWS Secrets).
- Never skip monitoring — without monitoring, you won't know the system is down until a user calls to complain.
- Rotate credentials regularly — change API Keys, JWT Secrets, Redis Passwords every 90 days.
- Never use raw SQL queries — if Agents connect to a database, always use parameterized queries.
Saeree ERP + Production AI:
Saeree ERP is developing an AI Assistant designed for production from the ground up — supporting on-premise deployment with SSL A+ rating, 2FA authentication, automated backups, and comprehensive monitoring. Interested in an ERP system ready for AI in the future? Consult with our team for free
OpenClaw Deep Dive Series — Read More
OpenClaw Deep Dive Series — 6 Episodes of In-Depth AI Agent Exploration:
- EP.1: Install OpenClaw and Build Your First AI Agent
- EP.2: OpenClaw Skills Deep Dive — Build Custom Skills from Scratch
- EP.3: OpenClaw Kernel Module Hands-On — Write Your Own Module Step by Step
- EP.4: Build Multi-Agent Workflows with OpenClaw — Design & Implement
- EP.5: Deploy OpenClaw to Production — Security, Monitoring, Scaling (this article)
- EP.6: Connect OpenClaw to ERP — Build an AI Assistant for Your Organization
"Deploying an AI Agent to production isn't just running npm start — you need to think about Security, Monitoring, and Scaling just like every system that serves real users."
- Saeree ERP Team




