- 2
- April
Ollama Series EP.2 — From EP.1 where we explained what Ollama is, now it's time to get hands-on! This article walks you through installing Ollama from scratch on macOS, Windows, or Linux — and running your first AI model within 5 minutes, including essential commands, choosing the right model for your hardware, and troubleshooting common issues.
In short — Installing Ollama is easier than you think
- macOS: Download .dmg from website → drag to Applications → done
- Windows: Download .exe → install → done
- Linux: Single command
curl -fsSL https://ollama.com/install.sh | sh - Run your first model:
ollama run llama3.1— get an AI chatbot instantly - Installation takes < 5 minutes (not including model download time)
System Requirements — What Hardware Do You Need?
Before installing, check if your machine meets the requirements. Ollama doesn't need a powerful machine — a regular laptop with 8 GB RAM can run 7-8B models. Having a GPU significantly speeds things up
| Spec | Minimum | Recommended | Enterprise |
|---|---|---|---|
| OS | macOS 11+, Windows 10+, Linux | macOS 14+, Windows 11, Ubuntu 22.04+ | Ubuntu 22.04 LTS Server |
| RAM | 8 GB | 16 GB | 64-128 GB |
| Disk Space | 10 GB | 50 GB | 500 GB+ SSD |
| CPU | x86_64 or Apple Silicon | Apple M1+ or Intel i7+ | AMD EPYC / Intel Xeon |
| GPU (Optional) | Not required (CPU works) | NVIDIA RTX 3060+ (VRAM 8 GB+) | NVIDIA RTX 4090 / A100 (VRAM 24-80 GB) |
| Models You Can Run | 7-8B (Llama 3.1 8B, Gemma 2 9B) | 14-34B (Phi-4, CodeLlama 34B) | 70B+ (Llama 3.1 70B, Qwen 72B) |
Apple Silicon Has the Advantage!
Macs with M1, M2, M3, M4 chips have a significant advantage for running Ollama because Unified Memory lets CPU and GPU share RAM — a MacBook Pro M3 Pro with 18 GB RAM can smoothly run 14B models, while a PC with 16 GB RAM may need a discrete GPU
Install on macOS
macOS was the first platform Ollama supported and works best, especially on Apple Silicon:
Method 1: Download from Website (Recommended)
- Go to ollama.com and click "Download for macOS"
- Open the downloaded
Ollama-darwin.zipfile - Drag Ollama.app to Applications
- Open Ollama.app — you'll see the Llama icon in the Menu Bar (top right)
- Open Terminal and type
ollama run llama3.1
Method 2: Via Homebrew
For developers who already use Homebrew:
brew install ollama
ollama serve
Install on Windows
Ollama supports Windows 10 and above (64-bit), including NVIDIA GPU, AMD GPU (Radeon RX 6000+), and CPU-only:
- Go to ollama.com and click "Download for Windows"
- Run
OllamaSetup.exe - Follow the installation steps (click Next → Install → Finish)
- Ollama will run as a Background Service automatically — look for the icon in System Tray
- Open PowerShell or Command Prompt and type
ollama run llama3.1
Windows: Important Notes
- Must be 64-bit only (Windows 32-bit not supported)
- If using NVIDIA GPU, update drivers to version 452.39 or above
- Windows Defender may warn during download — click "More info" → "Run anyway" (Ollama is safe open-source software)
- If your organization's firewall blocks it, open Port 11434 (Ollama API)
Install on Linux
Linux is the best platform for setting up an organization's Ollama Server — stable and resource-efficient:
Method 1: Automatic Script (Recommended)
curl -fsSL https://ollama.com/install.sh | sh
This script downloads Ollama, installs it, creates a systemd service, and starts the Ollama Server automatically. Supports Ubuntu, Debian, Fedora, CentOS, RHEL, Arch Linux, and more
Method 2: Download Binary Manually
# Download Binary
curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/local/bin/ollama
chmod +x /usr/local/bin/ollama
# Start Server
ollama serve
Method 3: Docker
For organizations already using Docker — ideal for enterprise-level infrastructure management:
# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Run Model
docker exec -it ollama ollama run llama3.1
Run Your First AI Model
After installation, regardless of OS, the next step is the same — open Terminal and type:
ollama run llama3.1
The first time, Ollama will download the model (~4.7 GB for Llama 3.1 8B), which may take 2-10 minutes depending on your internet speed. After downloading, you'll see a prompt to type your question:
>>> Hello, what AI are you?
Hello! I am Llama 3.1, a Large Language Model developed by Meta.
I can help answer questions, write articles, translate languages, and more.
How can I help you?
Done! You now have a private AI chatbot on your own machine — everything you type stays on your computer, nothing is sent outside. Press /bye to exit, or Ctrl+D
Essential Commands You Need to Know
After installing Ollama, these are the commands you'll use most frequently:
| Command | What It Does | Example |
|---|---|---|
ollama run <model> |
Run a model (auto-downloads if not present) | ollama run llama3.1 |
ollama pull <model> |
Download a model without running it | ollama pull qwen2.5:72b |
ollama list |
List downloaded models | ollama list |
ollama rm <model> |
Remove a model from your machine | ollama rm mistral |
ollama show <model> |
Show model info (size, parameters) | ollama show llama3.1 |
ollama ps |
Show currently running models | ollama ps |
ollama serve |
Start Ollama Server (usually runs automatically) | ollama serve |
ollama cp <src> <dst> |
Copy a model to create a new one from existing | ollama cp llama3.1 my-assistant |
Test — Run Multiple Models for Comparison
After installation, try running multiple models to compare — each has different strengths:
# All-round model (by Meta)
ollama run llama3.1
# Strong Asian languages (by Alibaba)
ollama run qwen2.5
# Reasoning model - thinks before answering (by DeepSeek)
ollama run deepseek-r1:8b
# Coding model (by Alibaba)
ollama run qwen2.5-coder
# Google model — very fast
ollama run gemma2
# Vision model — reads images
ollama run llava
Try asking the same question to multiple models and compare answers — you'll see each has a different response style. Some excel at Asian languages, others at coding. For detailed model comparisons, see EP.1 Popular Models Table
Install GUI — Open WebUI (No Terminal Needed)
For those who prefer not to type commands in Terminal, install Open WebUI — a beautiful ChatGPT-like web interface that connects to Ollama instantly:
# Install Open WebUI with Docker (Docker required)
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Open your browser to http://localhost:3000 — create an admin account, select a downloaded model, and start using it! Open WebUI is perfect for organizations that want all employees to use AI without opening Terminal — just open the web and type like ChatGPT
Troubleshooting Common Issues
| Problem | Cause | Solution |
|---|---|---|
| "Error: model requires more memory" | Not enough RAM for the selected model | Use a smaller model, e.g., switch from 70B to 8B, or use a quantized version ollama run llama3.1:8b-q4_0 |
| "Error: listen tcp 127.0.0.1:11434: bind: address already in use" | Ollama is already running (possibly in background) | Stop the running Ollama first: pkill ollama (Linux/macOS) or close from System Tray (Windows) |
| GPU not being used (very slow) | Drivers not updated or Ollama can't detect GPU | Update NVIDIA drivers to latest version. Verify with nvidia-smi then restart Ollama |
| Poor Thai language / nonsensical answers | Some models aren't good at Thai | Switch to qwen2.5 or llama3.1:70b which has better Thai support |
| Model download slow / interrupted | Slow or unstable internet | Run ollama pull <model> again — it will resume from where it stopped |
| Can't access from other machines (Server) | Ollama listens on localhost only | Set environment variable: OLLAMA_HOST=0.0.0.0 ollama serve |
| Disk full | Downloaded too many models | Remove unused models: ollama rm <model> List models with ollama list |
For Organizations — Set Up Ollama Server for Everyone
To let everyone in the organization use Ollama together without installing on every machine — just set up one server and let everyone access via Open WebUI:
# 1. Install Ollama on Linux server
curl -fsSL https://ollama.com/install.sh | sh
# 2. Configure to listen on all IPs (not just localhost)
sudo systemctl edit ollama
# Add this line:
# [Service]
# Environment="OLLAMA_HOST=0.0.0.0"
# 3. Restart Ollama
sudo systemctl restart ollama
# 4. Download desired models
ollama pull llama3.1
ollama pull qwen2.5
ollama pull deepseek-r1:8b
# 5. Install Open WebUI with Docker
docker run -d -p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Once done, tell employees to open http://<server-ip>:3000 in their browser — everyone gets a ChatGPT-like AI chatbot instantly. All data stays within the organization's network, compliant with PDPA requirements and the organization's data security policies
Saeree ERP + Ollama Server:
Organizations using Saeree ERP can set up an Ollama Server on the same network to analyze ERP data with AI without sending data outside — interested? Consult our team for free
Ollama Series — Read More
Ollama Series — 6 Episodes, Complete Local AI Guide:
- EP.1: What Is Ollama? — Run AI on Your Own Machine
- EP.2: Install Ollama on Every OS — macOS / Windows / Linux (this article)
- EP.3: Using Ollama for Real — Choosing Models, Writing Prompts, and Creating Modelfiles
- EP.4: Ollama + RAG — Build AI That Answers from Your Documents
- EP.5: Ollama API — Connect AI to Your Apps and Enterprise Systems
- EP.6: Secure Self-Hosted AI — Security & Best Practices
"Installing Ollama takes just 5 minutes, but what you get is a private AI that works 24/7 without monthly subscription fees."
- Saeree ERP Team
