- 3
- April
Ollama Series EP.6 — From EP.5 where we connected Ollama API to apps and enterprise systems, you may have noticed that Ollama has no built-in Authentication system at all — if you set OLLAMA_HOST=0.0.0.0 to allow other machines to connect, anyone on the network can use your AI instantly. No password required, no API Key, no protection whatsoever! EP.6 will walk you through securing your Ollama Self-Host on every front — from Reverse Proxy, Authentication, Firewall, TLS, Docker Isolation to Monitoring.
In short — What do you need to secure Ollama Self-Host?
- Reverse Proxy + TLS — Use nginx/Caddy as a gateway with HTTPS encryption
- Authentication — Add Basic Auth / API Key / OAuth for identity verification
- Firewall — Close unnecessary ports, open only what is needed
- Docker Isolation — Run inside a Container with limited resources
- Rate Limiting — Prevent abuse and DDoS attacks
- Monitoring & Logging — Detect anomalies in real time
Why Does Self-Hosted AI Need Security?
When you run Ollama on your own machine, all data stays within your organization's network — which is great for privacy. However, poor configuration can turn into a major vulnerability because Ollama is designed to be developer-friendly rather than security-focused from the start. The result: default settings have no protection at all:
| Risk | Cause | Impact | Solution |
|---|---|---|---|
| No Authentication | Ollama has no built-in Auth | Anyone can use the AI and consume GPU for free | Add Auth Layer via Reverse Proxy |
| No TLS/HTTPS | API sends data as Plain Text | Data can be sniffed in transit | Set up TLS Certificate via nginx/Caddy |
| No Rate Limit | No limit on number of requests | Server crashes from request flooding / GPU at 100% | Configure Rate Limit in nginx |
| Port open to network | Setting OLLAMA_HOST=0.0.0.0 | Every machine on the network can access it | Bind localhost + Firewall |
| No Monitoring | No logs or alerts | No visibility into who is using it, when, or how much | Set up Access Log + Prometheus |
According to Shodan (a search engine for internet-connected devices), over 7,000 Ollama servers worldwide have Port 11434 open to the Internet without any Authentication. This means anyone can use those machines' GPUs to run AI for free — or even download/delete models from them. This is the same issue we discussed in our Two-Factor Authentication article: a single password alone is not enough — you need multiple layers of protection.
10-Point Checklist — Secure Ollama Self-Host
Before diving into the details of each item, let's look at the overall Checklist:
| # | Item | Priority | Details |
|---|---|---|---|
| 1 | Bind localhost only | Critical | Set OLLAMA_HOST=127.0.0.1 — don't expose to the network directly |
| 2 | Reverse Proxy + TLS | Critical | Use nginx/Caddy + SSL Certificate to encrypt all requests |
| 3 | Authentication | Critical | Add Basic Auth / API Key / OAuth for identity verification before access |
| 4 | Firewall Rules | Critical | Use ufw/iptables to open only necessary ports |
| 5 | Docker Isolation | High | Run in a Container with Memory/CPU limits — won't affect the Host |
| 6 | Rate Limiting | High | Limit requests/second to prevent abuse and DDoS |
| 7 | Network Segmentation | High | Separate VLAN / Use VPN for machines that need AI access |
| 8 | Logging & Monitoring | High | Collect Access Logs + Prometheus/Grafana for Metrics |
| 9 | Regular Updates | Medium | Run ollama update regularly to receive security patches |
| 10 | Backup Model Data | Medium | Back up Model + Modelfile data regularly |
Reverse Proxy + TLS — nginx Configuration
Instead of exposing Ollama's Port 11434 directly to other machines, use nginx as a gateway (Reverse Proxy) and add an SSL Certificate to encrypt communications:
# /etc/nginx/sites-available/ollama.conf
upstream ollama {
server 127.0.0.1:11434;
}
server {
listen 443 ssl http2;
server_name ai.yourcompany.com;
# SSL Certificate (Let's Encrypt / Internal CA)
ssl_certificate /etc/ssl/certs/ollama.crt;
ssl_certificate_key /etc/ssl/private/ollama.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Basic Authentication
auth_basic "Ollama AI - Authorized Only";
auth_basic_user_file /etc/nginx/.htpasswd;
# Rate Limiting (defined in http block)
limit_req zone=ollama_limit burst=5 nodelay;
# Proxy to Ollama
location / {
proxy_pass http://ollama;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeout for AI (large models may take a long time)
proxy_read_timeout 300s;
proxy_send_timeout 300s;
# Streaming support
proxy_buffering off;
chunked_transfer_encoding on;
}
# Block sensitive endpoints
location /api/delete {
deny all;
return 403;
}
location /api/pull {
deny all;
return 403;
}
# Access Log
access_log /var/log/nginx/ollama_access.log;
error_log /var/log/nginx/ollama_error.log;
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name ai.yourcompany.com;
return 301 https://$server_name$request_uri;
}
This config does 5 things at once: (1) HTTPS encryption, (2) Basic Auth identity verification, (3) Rate Limiting, (4) blocks dangerous endpoints (delete/download models), (5) logs every request.
Authentication — Adding an Auth Layer
Since Ollama has no built-in Authentication, we need to add it ourselves through a Reverse Proxy. There are 3 main methods, each suited for different scenarios (similar to the principle of Digital Signature where identity must be verified before granting access):
| Method | Setup Difficulty | Best For | Pros | Cons |
|---|---|---|---|---|
| Basic Auth | Easy | Small teams of 2-10 people | Quick setup, works with nginx out of the box | Hard to manage users, no Token Expiry |
| API Key Header | Medium | Apps/Services calling the API | Great for M2M, easy to rotate keys | Requires additional Lua/Config |
| OAuth2 Proxy | Hard | Large organizations with SSO/LDAP | Integrates with SSO, Token Expiry, RBAC | Complex setup, requires an IdP |
Example: Setting up Basic Auth
# Create password file (requires apache2-utils)
sudo apt install apache2-utils
# Create first user (-c creates new file)
sudo htpasswd -c /etc/nginx/.htpasswd ai_user1
# Add additional users (without -c)
sudo htpasswd /etc/nginx/.htpasswd ai_user2
# Test API call with Auth
curl -u ai_user1:password123 https://ai.yourcompany.com/api/generate \
-d '{"model":"qwen2.5","prompt":"Hello","stream":false}'
Example: API Key Header (nginx)
# Add to nginx server block
# Check for "X-API-Key" header
location / {
if ($http_x_api_key != "your-secret-api-key-here") {
return 401 '{"error": "Unauthorized"}';
}
proxy_pass http://ollama;
}
# Usage:
curl -H "X-API-Key: your-secret-api-key-here" \
https://ai.yourcompany.com/api/generate \
-d '{"model":"qwen2.5","prompt":"Hello","stream":false}'
Docker Isolation — Running Ollama in a Container
Running Ollama in a Docker Container is another important layer of protection because it isolates the process from the Host. If Ollama is exploited, the impact is contained within the Container only:
# docker-compose.yml
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama-server
restart: unless-stopped
ports:
- "127.0.0.1:11434:11434" # Bind localhost only!
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
limits:
memory: 16G # Max RAM limit
cpus: '8' # Max CPU cores
reservations:
memory: 4G # Minimum RAM
# GPU support (NVIDIA)
# runtime: nvidia
# environment:
# - NVIDIA_VISIBLE_DEVICES=all
# Security options
security_opt:
- no-new-privileges:true
read_only: false
tmpfs:
- /tmp
volumes:
ollama_data:
driver: local
Benefits of running in Docker:
- Isolation — Separated from the Host OS; if Ollama is exploited, the main machine is unaffected
- Resource Control — Set maximum Memory/CPU limits to prevent AI from consuming all RAM and crashing the machine
- Easy Update — Update by simply changing the image version and running
docker compose up -d - Reproducible — Easy to migrate to another machine; just copy docker-compose.yml
Firewall & Network — Closing Unnecessary Access
Even with a Reverse Proxy in place, you should still configure a Firewall as an additional layer — this follows the Defense in Depth principle that requires multiple layers of protection:
# UFW (Ubuntu/Debian)
# Block Port 11434 from outside (Ollama listens on localhost only)
sudo ufw deny 11434
# Allow HTTPS for nginx Reverse Proxy
sudo ufw allow 443/tcp
# Allow SSH (for Admin)
sudo ufw allow 22/tcp
# Enable Firewall
sudo ufw enable
# Check Rules
sudo ufw status verbose
VPN/VLAN for Multi-Branch Organizations
If your organization has multiple branches or sites that need access to the AI Server, you should not expose Port 443 directly to the Internet. Instead, use:
- VPN (WireGuard / OpenVPN) — Connect branch networks through encrypted tunnels
- VLAN — Isolate the AI Server in a dedicated subnet accessible only to authorized machines
- Zero Trust — Every request must be authenticated, even from within the network
Rate Limiting & Monitoring
Rate Limiting prevents a single user or bot from flooding the Server with requests until the GPU/RAM is maxed out:
nginx Rate Limit Config
# Add to http block (/etc/nginx/nginx.conf)
http {
# Limit to 2 requests/second per IP
limit_req_zone $binary_remote_addr zone=ollama_limit:10m rate=2r/s;
# Limit concurrent connections to 5 per IP
limit_conn_zone $binary_remote_addr zone=ollama_conn:10m;
server {
# ...
limit_req zone=ollama_limit burst=5 nodelay;
limit_conn ollama_conn 5;
# Custom error message
limit_req_status 429;
error_page 429 = @rate_limited;
location @rate_limited {
return 429 '{"error": "Too many requests. Please wait."}';
}
}
}
Metrics to Monitor
| Metric | Tool | Alert Threshold | Why Monitor |
|---|---|---|---|
| Requests/sec | nginx access log / Prometheus | > 50 req/s | Detect abuse / DDoS |
| Response Time | nginx / Grafana | > 30 seconds | Model too large or server overloaded |
| GPU Usage | nvidia-smi / DCGM Exporter | > 95% for more than 10 min | GPU maxed out — need to scale or limit |
| Memory Usage | Prometheus node_exporter | > 90% RAM | Prevent OOM Kill |
| Error Rate (4xx/5xx) | nginx error log | > 5% of total | Detect bugs or attacks |
| Auth Failures | nginx access log (401) | > 10 times/min from same IP | Detect Brute Force |
Things you must NOT do (Common Mistakes):
- Never expose Port 11434 directly to the Internet — always go through a Reverse Proxy, even for "internal use only"
- Never run Ollama as root — create a dedicated user (e.g.,
ollama) and run with that user - Never skip TLS in Production — even on internal networks, data sent to AI may contain business secrets
- Never neglect logging — without logs, you won't know when you've been attacked, who accessed the system, or what they did
- Never use weak passwords — use strong passwords, or better yet, API Key / Certificate-based Auth
Saeree ERP + Self-Host AI:
Saeree ERP supports on-premise deployment with SSL A+ security standards and built-in Two-Factor Authentication. If your organization needs an ERP system with 100% data ownership and wants to connect a private AI via Ollama — consult our team for free, no charge
Ollama Series — Read More
Ollama Series — 6 Episodes for Complete Local AI:
- EP.1: What is Ollama? — Run AI on Your Own Machine
- EP.2: Install Ollama on Every OS — macOS / Windows / Linux
- EP.3: Ollama in Practice — Choose Models, Write Prompts, and Create Modelfiles
- EP.4: Ollama + RAG — Build AI That Answers from Your Documents
- EP.5: Ollama API — Connect AI to Your Apps and Enterprise Systems
- EP.6: Secure Self-Hosted AI — Security & Best Practices (this article)
"Self-hosting AI doesn't end at installation — you need to manage security just like any system that's accessible over a network."
- Saeree ERP Team


