02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Businesses Contact Us

\n\n

Install Ollama on Every OS

Install Ollama macOS Windows Linux Run AI Locally
  • 2
  • April

Ollama Series EP.2 — From EP.1 where we explained what Ollama is, now it's time to get hands-on! This article walks you through installing Ollama from scratch on macOS, Windows, or Linux — and running your first AI model within 5 minutes, including essential commands, choosing the right model for your hardware, and troubleshooting common issues.

In short — Installing Ollama is easier than you think

  • macOS: Download .dmg from website → drag to Applications → done
  • Windows: Download .exe → install → done
  • Linux: Single command curl -fsSL https://ollama.com/install.sh | sh
  • Run your first model: ollama run llama3.1 — get an AI chatbot instantly
  • Installation takes < 5 minutes (not including model download time)

System Requirements — What Hardware Do You Need?

Before installing, check if your machine meets the requirements. Ollama doesn't need a powerful machine — a regular laptop with 8 GB RAM can run 7-8B models. Having a GPU significantly speeds things up

Spec Minimum Recommended Enterprise
OS macOS 11+, Windows 10+, Linux macOS 14+, Windows 11, Ubuntu 22.04+ Ubuntu 22.04 LTS Server
RAM 8 GB 16 GB 64-128 GB
Disk Space 10 GB 50 GB 500 GB+ SSD
CPU x86_64 or Apple Silicon Apple M1+ or Intel i7+ AMD EPYC / Intel Xeon
GPU (Optional) Not required (CPU works) NVIDIA RTX 3060+ (VRAM 8 GB+) NVIDIA RTX 4090 / A100 (VRAM 24-80 GB)
Models You Can Run 7-8B (Llama 3.1 8B, Gemma 2 9B) 14-34B (Phi-4, CodeLlama 34B) 70B+ (Llama 3.1 70B, Qwen 72B)

Apple Silicon Has the Advantage!

Macs with M1, M2, M3, M4 chips have a significant advantage for running Ollama because Unified Memory lets CPU and GPU share RAM — a MacBook Pro M3 Pro with 18 GB RAM can smoothly run 14B models, while a PC with 16 GB RAM may need a discrete GPU

Install on macOS

macOS was the first platform Ollama supported and works best, especially on Apple Silicon:

Method 1: Download from Website (Recommended)

  1. Go to ollama.com and click "Download for macOS"
  2. Open the downloaded Ollama-darwin.zip file
  3. Drag Ollama.app to Applications
  4. Open Ollama.app — you'll see the Llama icon in the Menu Bar (top right)
  5. Open Terminal and type ollama run llama3.1

Method 2: Via Homebrew

For developers who already use Homebrew:

brew install ollama
ollama serve

Install on Windows

Ollama supports Windows 10 and above (64-bit), including NVIDIA GPU, AMD GPU (Radeon RX 6000+), and CPU-only:

  1. Go to ollama.com and click "Download for Windows"
  2. Run OllamaSetup.exe
  3. Follow the installation steps (click Next → Install → Finish)
  4. Ollama will run as a Background Service automatically — look for the icon in System Tray
  5. Open PowerShell or Command Prompt and type ollama run llama3.1

Windows: Important Notes

  • Must be 64-bit only (Windows 32-bit not supported)
  • If using NVIDIA GPU, update drivers to version 452.39 or above
  • Windows Defender may warn during download — click "More info" → "Run anyway" (Ollama is safe open-source software)
  • If your organization's firewall blocks it, open Port 11434 (Ollama API)

Install on Linux

Linux is the best platform for setting up an organization's Ollama Server — stable and resource-efficient:

Method 1: Automatic Script (Recommended)

curl -fsSL https://ollama.com/install.sh | sh

This script downloads Ollama, installs it, creates a systemd service, and starts the Ollama Server automatically. Supports Ubuntu, Debian, Fedora, CentOS, RHEL, Arch Linux, and more

Method 2: Download Binary Manually

# Download Binary
curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
chmod +x /usr/local/bin/ollama

# Start Server
ollama serve

Method 3: Docker

For organizations already using Docker — ideal for enterprise-level infrastructure management:

# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# Run Model
docker exec -it ollama ollama run llama3.1

Run Your First AI Model

After installation, regardless of OS, the next step is the same — open Terminal and type:

ollama run llama3.1

The first time, Ollama will download the model (~4.7 GB for Llama 3.1 8B), which may take 2-10 minutes depending on your internet speed. After downloading, you'll see a prompt to type your question:

>>> Hello, what AI are you?

Hello! I am Llama 3.1, a Large Language Model developed by Meta.
I can help answer questions, write articles, translate languages, and more.
How can I help you?

Done! You now have a private AI chatbot on your own machine — everything you type stays on your computer, nothing is sent outside. Press /bye to exit, or Ctrl+D

Essential Commands You Need to Know

After installing Ollama, these are the commands you'll use most frequently:

Command What It Does Example
ollama run <model> Run a model (auto-downloads if not present) ollama run llama3.1
ollama pull <model> Download a model without running it ollama pull qwen2.5:72b
ollama list List downloaded models ollama list
ollama rm <model> Remove a model from your machine ollama rm mistral
ollama show <model> Show model info (size, parameters) ollama show llama3.1
ollama ps Show currently running models ollama ps
ollama serve Start Ollama Server (usually runs automatically) ollama serve
ollama cp <src> <dst> Copy a model to create a new one from existing ollama cp llama3.1 my-assistant

Test — Run Multiple Models for Comparison

After installation, try running multiple models to compare — each has different strengths:

# All-round model (by Meta)
ollama run llama3.1

# Strong Asian languages (by Alibaba)
ollama run qwen2.5

# Reasoning model - thinks before answering (by DeepSeek)
ollama run deepseek-r1:8b

# Coding model (by Alibaba)
ollama run qwen2.5-coder

# Google model — very fast
ollama run gemma2

# Vision model — reads images
ollama run llava

Try asking the same question to multiple models and compare answers — you'll see each has a different response style. Some excel at Asian languages, others at coding. For detailed model comparisons, see EP.1 Popular Models Table

Install GUI — Open WebUI (No Terminal Needed)

For those who prefer not to type commands in Terminal, install Open WebUI — a beautiful ChatGPT-like web interface that connects to Ollama instantly:

# Install Open WebUI with Docker (Docker required)
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Open your browser to http://localhost:3000 — create an admin account, select a downloaded model, and start using it! Open WebUI is perfect for organizations that want all employees to use AI without opening Terminal — just open the web and type like ChatGPT

Troubleshooting Common Issues

Problem Cause Solution
"Error: model requires more memory" Not enough RAM for the selected model Use a smaller model, e.g., switch from 70B to 8B, or use a quantized version ollama run llama3.1:8b-q4_0
"Error: listen tcp 127.0.0.1:11434: bind: address already in use" Ollama is already running (possibly in background) Stop the running Ollama first: pkill ollama (Linux/macOS) or close from System Tray (Windows)
GPU not being used (very slow) Drivers not updated or Ollama can't detect GPU Update NVIDIA drivers to latest version. Verify with nvidia-smi then restart Ollama
Poor Thai language / nonsensical answers Some models aren't good at Thai Switch to qwen2.5 or llama3.1:70b which has better Thai support
Model download slow / interrupted Slow or unstable internet Run ollama pull <model> again — it will resume from where it stopped
Can't access from other machines (Server) Ollama listens on localhost only Set environment variable: OLLAMA_HOST=0.0.0.0 ollama serve
Disk full Downloaded too many models Remove unused models: ollama rm <model> List models with ollama list

For Organizations — Set Up Ollama Server for Everyone

To let everyone in the organization use Ollama together without installing on every machine — just set up one server and let everyone access via Open WebUI:

# 1. Install Ollama on Linux server
curl -fsSL https://ollama.com/install.sh | sh

# 2. Configure to listen on all IPs (not just localhost)
sudo systemctl edit ollama
# Add this line:
# [Service]
# Environment="OLLAMA_HOST=0.0.0.0"

# 3. Restart Ollama
sudo systemctl restart ollama

# 4. Download desired models
ollama pull llama3.1
ollama pull qwen2.5
ollama pull deepseek-r1:8b

# 5. Install Open WebUI with Docker
docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Once done, tell employees to open http://<server-ip>:3000 in their browser — everyone gets a ChatGPT-like AI chatbot instantly. All data stays within the organization's network, compliant with PDPA requirements and the organization's data security policies

Saeree ERP + Ollama Server:

Organizations using Saeree ERP can set up an Ollama Server on the same network to analyze ERP data with AI without sending data outside — interested? Consult our team for free

Ollama Series — Read More

"Installing Ollama takes just 5 minutes, but what you get is a private AI that works 24/7 without monthly subscription fees."

- Saeree ERP Team

References

Interested in ERP for Your Organization?

Consult with Grand Linux Solution experts — free of charge

Request a Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.

\n\n