Secure Self-Hosted AI — Security & Best Practices for Ollama

3
April

Ollama Series EP.6 — From EP.5 where we connected Ollama API to apps and enterprise systems, you may have noticed that Ollama has no built-in Authentication system at all — if you set OLLAMA_HOST=0.0.0.0 to allow other machines to connect, anyone on the network can use your AI instantly. No password required, no API Key, no protection whatsoever! EP.6 will walk you through securing your Ollama Self-Host on every front — from Reverse Proxy, Authentication, Firewall, TLS, Docker Isolation to Monitoring.

In short — What do you need to secure Ollama Self-Host?

Reverse Proxy + TLS — Use nginx/Caddy as a gateway with HTTPS encryption
Authentication — Add Basic Auth / API Key / OAuth for identity verification
Firewall — Close unnecessary ports, open only what is needed
Docker Isolation — Run inside a Container with limited resources
Rate Limiting — Prevent abuse and DDoS attacks
Monitoring & Logging — Detect anomalies in real time

Why Does Self-Hosted AI Need Security?

When you run Ollama on your own machine, all data stays within your organization's network — which is great for privacy. However, poor configuration can turn into a major vulnerability because Ollama is designed to be developer-friendly rather than security-focused from the start. The result: default settings have no protection at all:

Risk	Cause	Impact	Solution
No Authentication	Ollama has no built-in Auth	Anyone can use the AI and consume GPU for free	Add Auth Layer via Reverse Proxy
No TLS/HTTPS	API sends data as Plain Text	Data can be sniffed in transit	Set up TLS Certificate via nginx/Caddy
No Rate Limit	No limit on number of requests	Server crashes from request flooding / GPU at 100%	Configure Rate Limit in nginx
Port open to network	Setting OLLAMA_HOST=0.0.0.0	Every machine on the network can access it	Bind localhost + Firewall
No Monitoring	No logs or alerts	No visibility into who is using it, when, or how much	Set up Access Log + Prometheus

According to Shodan (a search engine for internet-connected devices), over 7,000 Ollama servers worldwide have Port 11434 open to the Internet without any Authentication. This means anyone can use those machines' GPUs to run AI for free — or even download/delete models from them. This is the same issue we discussed in our Two-Factor Authentication article: a single password alone is not enough — you need multiple layers of protection.

10-Point Checklist — Secure Ollama Self-Host

Before diving into the details of each item, let's look at the overall Checklist:

#	Item	Priority	Details
1	Bind localhost only	Critical	Set `OLLAMA_HOST=127.0.0.1` — don't expose to the network directly
2	Reverse Proxy + TLS	Critical	Use nginx/Caddy + SSL Certificate to encrypt all requests
3	Authentication	Critical	Add Basic Auth / API Key / OAuth for identity verification before access
4	Firewall Rules	Critical	Use ufw/iptables to open only necessary ports
5	Docker Isolation	High	Run in a Container with Memory/CPU limits — won't affect the Host
6	Rate Limiting	High	Limit requests/second to prevent abuse and DDoS
7	Network Segmentation	High	Separate VLAN / Use VPN for machines that need AI access
8	Logging & Monitoring	High	Collect Access Logs + Prometheus/Grafana for Metrics
9	Regular Updates	Medium	Run `ollama update` regularly to receive security patches
10	Backup Model Data	Medium	Back up Model + Modelfile data regularly

Reverse Proxy + TLS — nginx Configuration

Instead of exposing Ollama's Port 11434 directly to other machines, use nginx as a gateway (Reverse Proxy) and add an SSL Certificate to encrypt communications:

# /etc/nginx/sites-available/ollama.conf

upstream ollama {
    server 127.0.0.1:11434;
}

server {
    listen 443 ssl http2;
    server_name ai.yourcompany.com;

    # SSL Certificate (Let's Encrypt / Internal CA)
    ssl_certificate     /etc/ssl/certs/ollama.crt;
    ssl_certificate_key /etc/ssl/private/ollama.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Basic Authentication
    auth_basic           "Ollama AI - Authorized Only";
    auth_basic_user_file /etc/nginx/.htpasswd;

    # Rate Limiting (defined in http block)
    limit_req zone=ollama_limit burst=5 nodelay;

    # Proxy to Ollama
    location / {
        proxy_pass http://ollama;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeout for AI (large models may take a long time)
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;

        # Streaming support
        proxy_buffering off;
        chunked_transfer_encoding on;
    }

    # Block sensitive endpoints
    location /api/delete {
        deny all;
        return 403;
    }

    location /api/pull {
        deny all;
        return 403;
    }

    # Access Log
    access_log /var/log/nginx/ollama_access.log;
    error_log  /var/log/nginx/ollama_error.log;
}

# Redirect HTTP to HTTPS
server {
    listen 80;
    server_name ai.yourcompany.com;
    return 301 https://$server_name$request_uri;
}

This config does 5 things at once: (1) HTTPS encryption, (2) Basic Auth identity verification, (3) Rate Limiting, (4) blocks dangerous endpoints (delete/download models), (5) logs every request.

Authentication — Adding an Auth Layer

Since Ollama has no built-in Authentication, we need to add it ourselves through a Reverse Proxy. There are 3 main methods, each suited for different scenarios (similar to the principle of Digital Signature where identity must be verified before granting access):

Method	Setup Difficulty	Best For	Pros	Cons
Basic Auth	Easy	Small teams of 2-10 people	Quick setup, works with nginx out of the box	Hard to manage users, no Token Expiry
API Key Header	Medium	Apps/Services calling the API	Great for M2M, easy to rotate keys	Requires additional Lua/Config
OAuth2 Proxy	Hard	Large organizations with SSO/LDAP	Integrates with SSO, Token Expiry, RBAC	Complex setup, requires an IdP

Example: Setting up Basic Auth

# Create password file (requires apache2-utils)
sudo apt install apache2-utils

# Create first user (-c creates new file)
sudo htpasswd -c /etc/nginx/.htpasswd ai_user1

# Add additional users (without -c)
sudo htpasswd /etc/nginx/.htpasswd ai_user2

# Test API call with Auth
curl -u ai_user1:password123 https://ai.yourcompany.com/api/generate \
  -d '{"model":"qwen2.5","prompt":"Hello","stream":false}'

Example: API Key Header (nginx)

# Add to nginx server block
# Check for "X-API-Key" header
location / {
    if ($http_x_api_key != "your-secret-api-key-here") {
        return 401 '{"error": "Unauthorized"}';
    }
    proxy_pass http://ollama;
}

# Usage:
curl -H "X-API-Key: your-secret-api-key-here" \
  https://ai.yourcompany.com/api/generate \
  -d '{"model":"qwen2.5","prompt":"Hello","stream":false}'

Docker Isolation — Running Ollama in a Container

Running Ollama in a Docker Container is another important layer of protection because it isolates the process from the Host. If Ollama is exploited, the impact is contained within the Container only:

# docker-compose.yml
version: '3.8'

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama-server
    restart: unless-stopped
    ports:
      - "127.0.0.1:11434:11434"  # Bind localhost only!
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        limits:
          memory: 16G      # Max RAM limit
          cpus: '8'         # Max CPU cores
        reservations:
          memory: 4G        # Minimum RAM
    # GPU support (NVIDIA)
    # runtime: nvidia
    # environment:
    #   - NVIDIA_VISIBLE_DEVICES=all

    # Security options
    security_opt:
      - no-new-privileges:true
    read_only: false
    tmpfs:
      - /tmp

volumes:
  ollama_data:
    driver: local

Benefits of running in Docker:

Isolation — Separated from the Host OS; if Ollama is exploited, the main machine is unaffected
Resource Control — Set maximum Memory/CPU limits to prevent AI from consuming all RAM and crashing the machine
Easy Update — Update by simply changing the image version and running docker compose up -d
Reproducible — Easy to migrate to another machine; just copy docker-compose.yml

Firewall & Network — Closing Unnecessary Access

Even with a Reverse Proxy in place, you should still configure a Firewall as an additional layer — this follows the Defense in Depth principle that requires multiple layers of protection:

# UFW (Ubuntu/Debian)
# Block Port 11434 from outside (Ollama listens on localhost only)
sudo ufw deny 11434

# Allow HTTPS for nginx Reverse Proxy
sudo ufw allow 443/tcp

# Allow SSH (for Admin)
sudo ufw allow 22/tcp

# Enable Firewall
sudo ufw enable

# Check Rules
sudo ufw status verbose

VPN/VLAN for Multi-Branch Organizations

If your organization has multiple branches or sites that need access to the AI Server, you should not expose Port 443 directly to the Internet. Instead, use:

VPN (WireGuard / OpenVPN) — Connect branch networks through encrypted tunnels
VLAN — Isolate the AI Server in a dedicated subnet accessible only to authorized machines
Zero Trust — Every request must be authenticated, even from within the network

Rate Limiting & Monitoring

Rate Limiting prevents a single user or bot from flooding the Server with requests until the GPU/RAM is maxed out:

nginx Rate Limit Config

# Add to http block (/etc/nginx/nginx.conf)
http {
    # Limit to 2 requests/second per IP
    limit_req_zone $binary_remote_addr zone=ollama_limit:10m rate=2r/s;

    # Limit concurrent connections to 5 per IP
    limit_conn_zone $binary_remote_addr zone=ollama_conn:10m;

    server {
        # ...
        limit_req zone=ollama_limit burst=5 nodelay;
        limit_conn ollama_conn 5;

        # Custom error message
        limit_req_status 429;
        error_page 429 = @rate_limited;
        location @rate_limited {
            return 429 '{"error": "Too many requests. Please wait."}';
        }
    }
}

Metrics to Monitor

Metric	Tool	Alert Threshold	Why Monitor
Requests/sec	nginx access log / Prometheus	> 50 req/s	Detect abuse / DDoS
Response Time	nginx / Grafana	> 30 seconds	Model too large or server overloaded
GPU Usage	nvidia-smi / DCGM Exporter	> 95% for more than 10 min	GPU maxed out — need to scale or limit
Memory Usage	Prometheus node_exporter	> 90% RAM	Prevent OOM Kill
Error Rate (4xx/5xx)	nginx error log	> 5% of total	Detect bugs or attacks
Auth Failures	nginx access log (401)	> 10 times/min from same IP	Detect Brute Force

Things you must NOT do (Common Mistakes):

Never expose Port 11434 directly to the Internet — always go through a Reverse Proxy, even for "internal use only"
Never run Ollama as root — create a dedicated user (e.g., ollama) and run with that user
Never skip TLS in Production — even on internal networks, data sent to AI may contain business secrets
Never neglect logging — without logs, you won't know when you've been attacked, who accessed the system, or what they did
Never use weak passwords — use strong passwords, or better yet, API Key / Certificate-based Auth

Saeree ERP + Self-Host AI:

Saeree ERP supports on-premise deployment with SSL A+ security standards and built-in Two-Factor Authentication. If your organization needs an ERP system with 100% data ownership and wants to connect a private AI via Ollama — consult our team for free, no charge

Ollama Series — Read More

Ollama Series — 6 Episodes for Complete Local AI:

EP.1: What is Ollama? — Run AI on Your Own Machine
EP.2: Install Ollama on Every OS — macOS / Windows / Linux
EP.3: Ollama in Practice — Choose Models, Write Prompts, and Create Modelfiles
EP.4: Ollama + RAG — Build AI That Answers from Your Documents
EP.5: Ollama API — Connect AI to Your Apps and Enterprise Systems
EP.6: Secure Self-Hosted AI — Security & Best Practices (this article)

"Self-hosting AI doesn't end at installation — you need to manage security just like any system that's accessible over a network."
- Saeree ERP Team

Self-Host AI Security

Why Does Self-Hosted AI Need Security?

10-Point Checklist — Secure Ollama Self-Host

Reverse Proxy + TLS — nginx Configuration

Authentication — Adding an Auth Layer

Example: Setting up Basic Auth

Example: API Key Header (nginx)

Docker Isolation — Running Ollama in a Container

Firewall & Network — Closing Unnecessary Access

VPN/VLAN for Multi-Branch Organizations

Rate Limiting & Monitoring

nginx Rate Limit Config

Metrics to Monitor

Ollama Series — Read More

References

Bringing AI into your business — self-hosted or managed?

About the Author

Paitoon Butri

About Saeree ERP

Solutions

Resources

Contact Us

Self-Host AI Security

Why Does Self-Hosted AI Need Security?

10-Point Checklist — Secure Ollama Self-Host

Reverse Proxy + TLS — nginx Configuration

Authentication — Adding an Auth Layer

Example: Setting up Basic Auth

Example: API Key Header (nginx)

Docker Isolation — Running Ollama in a Container

Firewall & Network — Closing Unnecessary Access

VPN/VLAN for Multi-Branch Organizations

Rate Limiting & Monitoring

nginx Rate Limit Config

Metrics to Monitor

Ollama Series — Read More

References

Bringing AI into your business — self-hosted or managed?

About the Author

Paitoon Butri

Secure Self-Hosted AI — Security & Best Practices for Ollama

Ollama API — Connect AI to Your Apps and Enterprise Systems

ERP Data Security: What Executives Need to Know

Don't Miss Our Updates

About Saeree ERP

Solutions

Resources

Contact Us