02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Businesses Contact Us

Self-Host AI Security

Secure Self-Hosted AI Security Best Practices Ollama nginx Docker Firewall
  • 3
  • April

Ollama Series EP.6 — From EP.5 where we connected Ollama API to apps and enterprise systems, you may have noticed that Ollama has no built-in Authentication system at all — if you set OLLAMA_HOST=0.0.0.0 to allow other machines to connect, anyone on the network can use your AI instantly. No password required, no API Key, no protection whatsoever! EP.6 will walk you through securing your Ollama Self-Host on every front — from Reverse Proxy, Authentication, Firewall, TLS, Docker Isolation to Monitoring.

In short — What do you need to secure Ollama Self-Host?

  • Reverse Proxy + TLS — Use nginx/Caddy as a gateway with HTTPS encryption
  • Authentication — Add Basic Auth / API Key / OAuth for identity verification
  • Firewall — Close unnecessary ports, open only what is needed
  • Docker Isolation — Run inside a Container with limited resources
  • Rate Limiting — Prevent abuse and DDoS attacks
  • Monitoring & Logging — Detect anomalies in real time

Why Does Self-Hosted AI Need Security?

When you run Ollama on your own machine, all data stays within your organization's network — which is great for privacy. However, poor configuration can turn into a major vulnerability because Ollama is designed to be developer-friendly rather than security-focused from the start. The result: default settings have no protection at all:

Risk Cause Impact Solution
No Authentication Ollama has no built-in Auth Anyone can use the AI and consume GPU for free Add Auth Layer via Reverse Proxy
No TLS/HTTPS API sends data as Plain Text Data can be sniffed in transit Set up TLS Certificate via nginx/Caddy
No Rate Limit No limit on number of requests Server crashes from request flooding / GPU at 100% Configure Rate Limit in nginx
Port open to network Setting OLLAMA_HOST=0.0.0.0 Every machine on the network can access it Bind localhost + Firewall
No Monitoring No logs or alerts No visibility into who is using it, when, or how much Set up Access Log + Prometheus

According to Shodan (a search engine for internet-connected devices), over 7,000 Ollama servers worldwide have Port 11434 open to the Internet without any Authentication. This means anyone can use those machines' GPUs to run AI for free — or even download/delete models from them. This is the same issue we discussed in our Two-Factor Authentication article: a single password alone is not enough — you need multiple layers of protection.

10-Point Checklist — Secure Ollama Self-Host

Before diving into the details of each item, let's look at the overall Checklist:

# Item Priority Details
1 Bind localhost only Critical Set OLLAMA_HOST=127.0.0.1 — don't expose to the network directly
2 Reverse Proxy + TLS Critical Use nginx/Caddy + SSL Certificate to encrypt all requests
3 Authentication Critical Add Basic Auth / API Key / OAuth for identity verification before access
4 Firewall Rules Critical Use ufw/iptables to open only necessary ports
5 Docker Isolation High Run in a Container with Memory/CPU limits — won't affect the Host
6 Rate Limiting High Limit requests/second to prevent abuse and DDoS
7 Network Segmentation High Separate VLAN / Use VPN for machines that need AI access
8 Logging & Monitoring High Collect Access Logs + Prometheus/Grafana for Metrics
9 Regular Updates Medium Run ollama update regularly to receive security patches
10 Backup Model Data Medium Back up Model + Modelfile data regularly

Reverse Proxy + TLS — nginx Configuration

Instead of exposing Ollama's Port 11434 directly to other machines, use nginx as a gateway (Reverse Proxy) and add an SSL Certificate to encrypt communications:

# /etc/nginx/sites-available/ollama.conf

upstream ollama {
    server 127.0.0.1:11434;
}

server {
    listen 443 ssl http2;
    server_name ai.yourcompany.com;

    # SSL Certificate (Let's Encrypt / Internal CA)
    ssl_certificate     /etc/ssl/certs/ollama.crt;
    ssl_certificate_key /etc/ssl/private/ollama.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Basic Authentication
    auth_basic           "Ollama AI - Authorized Only";
    auth_basic_user_file /etc/nginx/.htpasswd;

    # Rate Limiting (defined in http block)
    limit_req zone=ollama_limit burst=5 nodelay;

    # Proxy to Ollama
    location / {
        proxy_pass http://ollama;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout for AI (large models may take a long time)
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;

        # Streaming support
        proxy_buffering off;
        chunked_transfer_encoding on;
    }

    # Block sensitive endpoints
    location /api/delete {
        deny all;
        return 403;
    }

    location /api/pull {
        deny all;
        return 403;
    }

    # Access Log
    access_log /var/log/nginx/ollama_access.log;
    error_log  /var/log/nginx/ollama_error.log;
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name ai.yourcompany.com;
    return 301 https://$server_name$request_uri;
}

This config does 5 things at once: (1) HTTPS encryption, (2) Basic Auth identity verification, (3) Rate Limiting, (4) blocks dangerous endpoints (delete/download models), (5) logs every request.

Authentication — Adding an Auth Layer

Since Ollama has no built-in Authentication, we need to add it ourselves through a Reverse Proxy. There are 3 main methods, each suited for different scenarios (similar to the principle of Digital Signature where identity must be verified before granting access):

Method Setup Difficulty Best For Pros Cons
Basic Auth Easy Small teams of 2-10 people Quick setup, works with nginx out of the box Hard to manage users, no Token Expiry
API Key Header Medium Apps/Services calling the API Great for M2M, easy to rotate keys Requires additional Lua/Config
OAuth2 Proxy Hard Large organizations with SSO/LDAP Integrates with SSO, Token Expiry, RBAC Complex setup, requires an IdP

Example: Setting up Basic Auth

# Create password file (requires apache2-utils)
sudo apt install apache2-utils

# Create first user (-c creates new file)
sudo htpasswd -c /etc/nginx/.htpasswd ai_user1

# Add additional users (without -c)
sudo htpasswd /etc/nginx/.htpasswd ai_user2

# Test API call with Auth
curl -u ai_user1:password123 https://ai.yourcompany.com/api/generate \
  -d '{"model":"qwen2.5","prompt":"Hello","stream":false}'

Example: API Key Header (nginx)

# Add to nginx server block
# Check for "X-API-Key" header
location / {
    if ($http_x_api_key != "your-secret-api-key-here") {
        return 401 '{"error": "Unauthorized"}';
    }
    proxy_pass http://ollama;
}

# Usage:
curl -H "X-API-Key: your-secret-api-key-here" \
  https://ai.yourcompany.com/api/generate \
  -d '{"model":"qwen2.5","prompt":"Hello","stream":false}'

Docker Isolation — Running Ollama in a Container

Running Ollama in a Docker Container is another important layer of protection because it isolates the process from the Host. If Ollama is exploited, the impact is contained within the Container only:

# docker-compose.yml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama-server
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"  # Bind localhost only!
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        limits:
          memory: 16G      # Max RAM limit
          cpus: '8'         # Max CPU cores
        reservations:
          memory: 4G        # Minimum RAM
    # GPU support (NVIDIA)
    # runtime: nvidia
    # environment:
    #   - NVIDIA_VISIBLE_DEVICES=all

    # Security options
    security_opt:
      - no-new-privileges:true
    read_only: false
    tmpfs:
      - /tmp

volumes:
  ollama_data:
    driver: local

Benefits of running in Docker:

  • Isolation — Separated from the Host OS; if Ollama is exploited, the main machine is unaffected
  • Resource Control — Set maximum Memory/CPU limits to prevent AI from consuming all RAM and crashing the machine
  • Easy Update — Update by simply changing the image version and running docker compose up -d
  • Reproducible — Easy to migrate to another machine; just copy docker-compose.yml

Firewall & Network — Closing Unnecessary Access

Even with a Reverse Proxy in place, you should still configure a Firewall as an additional layer — this follows the Defense in Depth principle that requires multiple layers of protection:

# UFW (Ubuntu/Debian)
# Block Port 11434 from outside (Ollama listens on localhost only)
sudo ufw deny 11434

# Allow HTTPS for nginx Reverse Proxy
sudo ufw allow 443/tcp

# Allow SSH (for Admin)
sudo ufw allow 22/tcp

# Enable Firewall
sudo ufw enable

# Check Rules
sudo ufw status verbose

VPN/VLAN for Multi-Branch Organizations

If your organization has multiple branches or sites that need access to the AI Server, you should not expose Port 443 directly to the Internet. Instead, use:

  • VPN (WireGuard / OpenVPN) — Connect branch networks through encrypted tunnels
  • VLAN — Isolate the AI Server in a dedicated subnet accessible only to authorized machines
  • Zero Trust — Every request must be authenticated, even from within the network

Rate Limiting & Monitoring

Rate Limiting prevents a single user or bot from flooding the Server with requests until the GPU/RAM is maxed out:

nginx Rate Limit Config

# Add to http block (/etc/nginx/nginx.conf)
http {
    # Limit to 2 requests/second per IP
    limit_req_zone $binary_remote_addr zone=ollama_limit:10m rate=2r/s;

    # Limit concurrent connections to 5 per IP
    limit_conn_zone $binary_remote_addr zone=ollama_conn:10m;

    server {
        # ...
        limit_req zone=ollama_limit burst=5 nodelay;
        limit_conn ollama_conn 5;

        # Custom error message
        limit_req_status 429;
        error_page 429 = @rate_limited;
        location @rate_limited {
            return 429 '{"error": "Too many requests. Please wait."}';
        }
    }
}

Metrics to Monitor

Metric Tool Alert Threshold Why Monitor
Requests/sec nginx access log / Prometheus > 50 req/s Detect abuse / DDoS
Response Time nginx / Grafana > 30 seconds Model too large or server overloaded
GPU Usage nvidia-smi / DCGM Exporter > 95% for more than 10 min GPU maxed out — need to scale or limit
Memory Usage Prometheus node_exporter > 90% RAM Prevent OOM Kill
Error Rate (4xx/5xx) nginx error log > 5% of total Detect bugs or attacks
Auth Failures nginx access log (401) > 10 times/min from same IP Detect Brute Force

Things you must NOT do (Common Mistakes):

  • Never expose Port 11434 directly to the Internet — always go through a Reverse Proxy, even for "internal use only"
  • Never run Ollama as root — create a dedicated user (e.g., ollama) and run with that user
  • Never skip TLS in Production — even on internal networks, data sent to AI may contain business secrets
  • Never neglect logging — without logs, you won't know when you've been attacked, who accessed the system, or what they did
  • Never use weak passwords — use strong passwords, or better yet, API Key / Certificate-based Auth

Saeree ERP + Self-Host AI:

Saeree ERP supports on-premise deployment with SSL A+ security standards and built-in Two-Factor Authentication. If your organization needs an ERP system with 100% data ownership and wants to connect a private AI via Ollama — consult our team for free, no charge

Ollama Series — Read More

"Self-hosting AI doesn't end at installation — you need to manage security just like any system that's accessible over a network."

- Saeree ERP Team

References

Interested in ERP for Your Organization?

Consult with Grand Linux Solution experts — free of charge

Request Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.