02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Businesses Contact Us

Deploy OpenClaw to Production

Deploy OpenClaw to Production Security Monitoring Scaling Docker Container
  • 3
  • April

OpenClaw Deep Dive Series EP.5 — Following EP.4 where we built Multi-Agent Workflows with multiple AI agents collaborating as a team, it's now time to deploy OpenClaw to a real production environment! Running an Agent on a dev machine is very different from deploying it for multiple concurrent users — you need to think about Security, Monitoring, and Scaling to handle real-world conditions, just like self-hosting Ollama where you must prioritize security from day one.

Quick Summary — What You'll Learn in This Article:

  • Docker / Container Deployment — Containerize OpenClaw with Docker + docker-compose
  • Reverse Proxy + TLS — nginx + SSL Certificate to prevent eavesdropping
  • Authentication & Authorization — API Key, JWT, OAuth2/SSO + RBAC
  • Logging & Monitoring — Structured Logging + Prometheus + Grafana
  • Scaling Strategies — Vertical, Horizontal, Queue-based
  • Backup & Recovery — Backing up Agent configs, conversation history

Production vs Development — What's the Difference?

Many people run OpenClaw on localhost and think "it works fine" — but as soon as real users start accessing it, problems appear immediately: no authentication means anyone can access it, no monitoring means you won't know the system is down, no backup means data loss is permanent.

Aspect Development Production
Network localhost:3000, no TLS Domain + Reverse Proxy + TLS (HTTPS)
Authentication None — anyone can access API Key / JWT / OAuth2 + RBAC
Users Single user (developer) Multiple concurrent users + Load Balancing
Monitoring Reading console.log Prometheus + Grafana + Alerts
Backup None Automated Backup + Disaster Recovery Plan
Deployment npm start / node index.js Docker Container + Orchestration

Docker Deployment — Containerize OpenClaw

The first step in deployment is Containerization — running OpenClaw inside a Docker Container ensures a consistent environment everywhere, whether it's dev, staging, or production.

Dockerfile

# Dockerfile for OpenClaw Production
FROM node:20-alpine AS builder

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Multi-stage build — reduce image size
FROM node:20-alpine
WORKDIR /app

# Security: don't run as root
RUN addgroup -S openclaw && adduser -S openclaw -G openclaw

COPY --from=builder /app /app
RUN chown -R openclaw:openclaw /app

USER openclaw

EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["node", "index.js"]

docker-compose.yml

# docker-compose.yml — OpenClaw + nginx + Redis
version: '3.8'

services:
  openclaw:
    build: .
    restart: unless-stopped
    env_file: .env
    networks:
      - openclaw-net
    depends_on:
      - redis
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  nginx:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certs:/etc/nginx/certs:ro
    depends_on:
      - openclaw
    networks:
      - openclaw-net

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data
    networks:
      - openclaw-net

networks:
  openclaw-net:
    driver: bridge

volumes:
  redis-data:

Environment Variables (.env)

Variable Example Value Description
NODE_ENV production Production mode (disables debug, enables optimization)
LLM_API_KEY sk-xxxx... API Key for LLM connection (OpenAI, Anthropic, etc.)
REDIS_PASSWORD strong-random-password Password for Redis (session + queue)
API_SECRET jwt-secret-key Secret for signing JWT tokens
LOG_LEVEL info Log level (debug, info, warn, error)

Reverse Proxy + TLS

Never expose OpenClaw directly to the internet — always use a Reverse Proxy to manage TLS, rate limiting, and access control, just like configuring a self-hosted Ollama that requires nginx in front.

# nginx.conf — Reverse Proxy + TLS + WebSocket
server {
    listen 80;
    server_name openclaw.example.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name openclaw.example.com;

    # TLS — Let's Encrypt
    ssl_certificate     /etc/nginx/certs/fullchain.pem;
    ssl_certificate_key /etc/nginx/certs/privkey.pem;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Security Headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Strict-Transport-Security "max-age=31536000" always;

    # Rate Limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    location / {
        limit_req zone=api burst=20 nodelay;
        proxy_pass http://openclaw:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # WebSocket for streaming responses
    location /ws {
        proxy_pass http://openclaw:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400;
    }
}

Alternatively, use Caddy for automatic TLS management — just 5 lines:

# Caddyfile — Auto TLS
openclaw.example.com {
    reverse_proxy openclaw:3000
    encode gzip
}

Authentication & Authorization

An OpenClaw instance open to anyone is dangerous! You must have an identity verification system in place — choose the security level appropriate for your organization:

Level Method Best For Advantage
Basic API Key in Header Internal tools, small teams Simple, quick to set up
Standard JWT Token (per-user) Medium teams, need user separation User separation, audit log capability
Enterprise OAuth2 / SSO (AD, LDAP) Large organizations with Identity Provider Centralized management, highest security

Middleware: API Key Validation

// middleware/auth.js — API Key Validation

const VALID_API_KEYS = new Set(
  process.env.API_KEYS?.split(',') || []
);

function authenticateApiKey(req, res, next) {
  const apiKey = req.headers['x-api-key'];

  if (!apiKey) {
    return res.status(401).json({
      error: 'Missing API Key',
      message: 'Send API Key via header: X-API-Key'
    });
  }

  if (!VALID_API_KEYS.has(apiKey)) {
    console.warn(`[Auth] Invalid API Key attempt from ${req.ip}`);
    return res.status(403).json({
      error: 'Invalid API Key'
    });
  }

  // Identify user from API Key (for audit logging)
  req.user = { apiKey: apiKey.substring(0, 8) + '...' };
  next();
}

module.exports = { authenticateApiKey };

RBAC — Role-Based Access Control

Role Permissions Example Users
Admin Full Access — manage Agents, Skills, Modules, Users DevOps, System Administrators
User Chat + use authorized Agents (cannot modify config) General employees
ReadOnly View logs + dashboard only (cannot chat) Executives, Auditors

Using Digital Signatures alongside API Keys provides additional protection against request forgery, especially for organizations requiring high-level security.

Logging & Monitoring

If you don't monitor, you won't know there's a problem — Production must have proper logging from day one for effective risk management.

What to Log

  • User messages — who sent what and when (anonymize if PII is present)
  • Agent actions — which Agent did what and how long it took
  • Tool calls — which Skill/Module was called and what was the result
  • Errors — every error with full stack trace
  • Response times — how long each request took

Structured Logging (JSON)

// logger.js — Structured JSON Logging
const winston = require('winston');

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.json()
  ),
  defaultMeta: { service: 'openclaw-production' },
  transports: [
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
    new winston.transports.File({ filename: 'logs/combined.log' })
  ]
});

// Example log entry:
// {
//   "timestamp": "2026-04-03T09:15:30.123Z",
//   "level": "info",
//   "service": "openclaw-production",
//   "message": "Agent completed task",
//   "agent": "research-agent",
//   "userId": "user-abc",
//   "duration_ms": 2340,
//   "tokens_used": 1520,
//   "status": "success"
// }

Key Metrics to Monitor

Metric Target Alert When
Requests/sec < 100 req/s > 80 req/s (approaching capacity)
Response Time (p95) < 5 seconds > 10 seconds (too slow)
Error Rate < 1% > 5% (serious issue)
Token Usage/day Within set budget > 80% of daily budget
Cost per Request Within baseline Increases > 50% from baseline

We recommend using Prometheus for metric collection + Grafana for dashboards + alert rules that notify via Slack/Email when values are abnormal.

Scaling Strategies — Handling More Users

As users increase, a single OpenClaw instance may not be enough — there are 3 main strategies for scaling:

Strategy Method Best For Limitation
Vertical Scaling Add more CPU / RAM to existing machine Starting out, simplest approach, no code changes Has a ceiling, cannot scale indefinitely
Horizontal Scaling Add more instances + Load Balancer Many users, need high availability Requires session sharing management (Redis)
Queue-based Redis Queue for async processing Long-running AI tasks, non-real-time Not suitable for real-time chat

Architecture: Horizontal Scaling

# Architecture — Horizontal Scaling with Load Balancer

User Request
    |
    v
[nginx Load Balancer] (Round Robin / Least Connections)
    |
    |--> [OpenClaw Instance 1] -->\
    |--> [OpenClaw Instance 2] --> [Shared Redis] --> [LLM API]
    |--> [OpenClaw Instance 3] -->/
    |
    v
[Prometheus + Grafana] (Monitoring)
[Log Aggregator] (ELK / Loki)

All instances share Redis for sessions, queues, and caching — this allows users to interact with any instance without losing data.

Backup & Recovery

AI Agent data is more important than you might think — Agent configs, Skill definitions, and conversation history are difficult to recreate once lost. You need a clear Disaster Recovery plan.

What to Backup

  • Agent configs — YAML/JSON files defining agents, prompts, models
  • Skills & Modules — custom code you've written
  • Conversation history — chat history (if stored)
  • .env file — environment variables (store separately, never in git)
  • Redis data — sessions, cache, queue

Backup Script (Cron Job)

#!/bin/bash
# backup-openclaw.sh — runs daily at 2 AM via cron
# crontab: 0 2 * * * /opt/openclaw/backup-openclaw.sh

BACKUP_DIR="/backup/openclaw/$(date +%Y-%m-%d)"
OPENCLAW_DIR="/opt/openclaw"

mkdir -p "$BACKUP_DIR"

# 1. Backup Agent configs + Skills
cp -r "$OPENCLAW_DIR/agents" "$BACKUP_DIR/"
cp -r "$OPENCLAW_DIR/skills" "$BACKUP_DIR/"
cp -r "$OPENCLAW_DIR/modules" "$BACKUP_DIR/"

# 2. Backup conversation history (Redis RDB)
docker exec openclaw-redis redis-cli -a "$REDIS_PASSWORD" BGSAVE
sleep 5
docker cp openclaw-redis:/data/dump.rdb "$BACKUP_DIR/redis-dump.rdb"

# 3. Backup .env (encrypted)
gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/env.gpg" "$OPENCLAW_DIR/.env"

# 4. Compress
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"

# 5. Delete backups older than 30 days
find /backup/openclaw/ -name "*.tar.gz" -mtime +30 -delete

echo "[Backup] Completed: $BACKUP_DIR.tar.gz"

Critical Production Mistakes to Avoid:

  • Never expose OpenClaw directly to the internet — always go through a Reverse Proxy, which provides better protection against attacks.
  • Never store API Keys in code or commit them to git — use .env files or a Secret Manager (Vault, AWS Secrets).
  • Never skip monitoring — without monitoring, you won't know the system is down until a user calls to complain.
  • Rotate credentials regularly — change API Keys, JWT Secrets, Redis Passwords every 90 days.
  • Never use raw SQL queries — if Agents connect to a database, always use parameterized queries.

Saeree ERP + Production AI:

Saeree ERP is developing an AI Assistant designed for production from the ground up — supporting on-premise deployment with SSL A+ rating, 2FA authentication, automated backups, and comprehensive monitoring. Interested in an ERP system ready for AI in the future? Consult with our team for free

OpenClaw Deep Dive Series — Read More

"Deploying an AI Agent to production isn't just running npm start — you need to think about Security, Monitoring, and Scaling just like every system that serves real users."

- Saeree ERP Team

References

Interested in ERP for Your Organization?

Consult with Grand Linux Solution experts — free of charge

Request Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.