- 31
- March
DeepSeek Series EP.4
In EP.3 of the DeepSeek Series, we explained how sending data to DeepSeek's servers in China carries significant Data Privacy risks — from China's National Intelligence Law to Thailand's PDPA. But what if you never send data out at all? What if you download DeepSeek's model and run it on your own organization's servers — where not a single byte of data ever leaves your internal network? This is the concept behind Self-hosting AI, and it is the answer for organizations that want to use open-source AI without risking data leakage.
However, self-hosting is not free — it requires hardware investment, an IT team to maintain the infrastructure, and a solid understanding of each model size's limitations. This article walks you through every aspect of self-hosting DeepSeek, from which models can run on a laptop to how many millions of baht you need for enterprise deployment, complete with step-by-step installation instructions you can follow right away.
Quick Summary — What is Self-Hosting DeepSeek?
- Self-hosting DeepSeek means downloading the model and running it on your organization's own servers — data never leaves your internal network
- Requires 4-512GB+ RAM depending on the model size you choose (from 1.5B to 671B parameters)
- Start with DeepSeek-R1-Distill-Qwen-7B, which runs on a regular laptop, all the way up to V3 671B for enterprise use
- Cost-effective for organizations with high AI usage (more than $500/month in API costs) — self-hosting pays for itself within 1-2 years
- Easiest installation via Ollama — just 3 commands to get started
What is Self-Hosting? How Does It Differ from Using an API?
Before diving into the details, let us clarify the difference between "self-hosting" and "using an API," because these two approaches have vastly different pros and cons. Choosing the wrong one can cost your organization both money and time unnecessarily.
| Aspect | API (Cloud) | Self-Host (On-Premise) |
|---|---|---|
| Data Location | DeepSeek's servers (China) | Your organization's servers |
| Cost | Pay per use (per token) | One-time hardware investment |
| Data Privacy | Data leaves your organization | 100% internal |
| Customization | Limited — only what DeepSeek exposes | Full fine-tuning capabilities |
| Maintenance | No maintenance required | Requires an IT team |
| Uptime | Depends on DeepSeek (has gone down for 7 hours) | 100% under your control |
Put simply, using an API is like renting AI — convenient and maintenance-free, but your data leaves the organization. Self-hosting is like buying AI and keeping it at home — a one-time investment where data stays with you, but you are responsible for upkeep. This is the same concept as on-premise ERP systems that many organizations are already familiar with.
DeepSeek Models Available for Self-Hosting — From Laptop to Data Center
DeepSeek offers models in multiple sizes for download, ranging from tiny models that run on a regular laptop to massive models that require multi-million-baht GPU servers. This is the most important table in this article — bookmark it for reference:
| Model | Size | RAM Required | GPU | Best For |
|---|---|---|---|---|
| R1-Distill-Qwen-1.5B | 1.5B | 4GB | Not required (CPU only) | Testing, learning |
| R1-Distill-Qwen-7B | 7B | 8-16GB | GPU 8GB+ or CPU | Simple chatbots, document summarization |
| R1-Distill-Qwen-32B | 32B | 24-48GB | GPU 24GB+ (RTX 4090) | Mid-level analysis tasks |
| R1-Distill-Llama-70B | 70B | 48-96GB | GPU 2x48GB (A6000) | High-quality output tasks |
| DeepSeek-V3 (full) | 671B | 350-500GB+ | 8xA100/H100 80GB | Enterprise, production |
| DeepSeek-R1 (full) | 671B | 350-500GB+ | 8xA100/H100 80GB | Advanced reasoning tasks |
What is particularly noteworthy is that the 7B model, which has been distilled from the full-size model, delivers surprisingly good results — outscoring ChatGPT's GPT-3.5 (the AI standard just two years ago) on many benchmarks, despite running on a regular laptop. For a deeper dive into the technical Mixture of Experts (MoE) architecture that makes DeepSeek so resource-efficient, check out EP.2 of this series.
Real Hardware Costs — A 3-Tier Comparison
Now for the question everyone wants answered: how much do you need to invest? We have broken this down into 3 tiers based on organization size and number of users:
| Tier | Model | Hardware | Approx. Cost | Best For |
|---|---|---|---|---|
| Starter | 7B (Quantized) | Mac Mini M4 Pro 48GB or PC + RTX 4060 | ฿40,000-80,000 | Testing, 1-5 users |
| Mid-Range | 32B-70B | PC + RTX 4090 24GB or Mac Studio M4 Ultra 192GB | ฿100,000-250,000 | Department of 10-30 users |
| Enterprise | 671B (V3/R1) | Server 8xA100 80GB or Mac Cluster | ฿3,000,000-8,000,000 | Entire organization, 100+ users |
What stands out is the "Starter" tier — a Mac Mini M4 Pro costing just around 40,000 baht can run the 7B model smoothly, responding to queries in 1-3 seconds. This makes it ideal for organizations that want to experiment with self-hosted AI before committing to a serious investment. At the Enterprise tier, running the full 671B-parameter model may cost up to 8 million baht in hardware, but for organizations with high AI usage volume, it pays off compared to long-term API costs.
How to Self-Host DeepSeek — Step by Step
The easiest way to get started with self-hosting DeepSeek is through Ollama, an open-source tool that makes running LLMs on your own machine as easy as running Docker — just 3 commands and you are up and running:
Step 1: Install Ollama
For Linux/macOS, open your Terminal and run a single command:
curl -fsSL https://ollama.com/install.sh | sh
For macOS, you can also download directly from ollama.com or use Homebrew: brew install ollama. For Windows, download the installer from the Ollama website.
Step 2: Download the DeepSeek Model
Choose the model that matches your hardware, then run:
# 7B model — suitable for regular laptops/PCs (8GB+ RAM)
ollama pull deepseek-r1:7b
# 32B model — suitable for PCs with GPU 24GB+
ollama pull deepseek-r1:32b
# 70B model — requires GPU 48GB+ or 96GB+ RAM
ollama pull deepseek-r1:70b
Ollama automatically downloads the model in a Quantized (compressed) format, which significantly reduces the file size and RAM requirements while only marginally reducing output quality.
Step 3: Run and Start Using It
ollama run deepseek-r1:7b
That is it — you can now chat with DeepSeek running entirely on your own machine. All data stays on your device; nothing is sent externally.
Step 4: Connect via API (For Integration with Other Systems)
Ollama includes a built-in API server that is compatible with the OpenAI API format. This means applications that currently connect to the ChatGPT API can switch to Ollama with virtually no code changes:
curl http://localhost:11434/api/generate \
-d '{"model": "deepseek-r1:7b", "prompt": "Summarize this financial report...", "stream": false}'
Beyond Ollama, there are several alternative self-hosting tools suited to different use cases:
| Tool | Key Strength | Best For |
|---|---|---|
| Ollama | Easiest setup, install in 3 minutes | Getting started, testing, general use |
| vLLM | Fastest inference, high concurrent user support | Production, Enterprise |
| llama.cpp | Lightest weight, runs well on CPU | Machines without a GPU |
| Hugging Face TGI | Supports many model types, includes a dashboard | Data Science teams |
Cost Comparison: Self-Host vs API — Over 1-3 Years
This is the table that decision-makers need to see — a real cost comparison between DeepSeek API and self-hosting over a 1-3 year period. The scenario assumes an organization processes approximately 10 million tokens per day (roughly 20-30 regular users):
| Item | DeepSeek API | Self-Host 7B | Self-Host 70B |
|---|---|---|---|
| Hardware Cost | ฿0 | ฿60,000 (one-time) | ฿200,000 (one-time) |
| Monthly API/Cost | ~฿3,000 | ฿0 (electricity ~฿500) | ฿0 (electricity ~฿2,000) |
| 1-Year Total | ฿36,000 | ฿66,000 | ฿224,000 |
| 3-Year Total | ฿108,000 | ฿78,000 ✅ | ฿272,000 |
| Data Privacy | ❌ Data goes to China | ✅ Stays internal | ✅ Stays internal |
Note: The figures above are estimates for 10M tokens/day. If your organization processes higher volumes (more than 50M tokens/day), self-hosting becomes cost-effective much faster — potentially reaching ROI within 6 months, because API costs scale linearly with usage while hardware costs are fixed.
As the table shows, self-hosting the 7B model becomes cheaper than the API from year 2 onward. Although the model is smaller, it performs remarkably well for everyday tasks like document summarization, question answering, and report drafting. The 70B model has higher hardware costs, making it better suited for organizations that need premium output quality and are willing to accept a larger upfront investment.
Pros and Cons of Self-Hosting DeepSeek
Before making a decision, let us examine all the advantages and disadvantages:
| ✅ Pros | ❌ Cons |
|---|---|
| 100% Data Privacy — not a single byte leaves your organization | High upfront investment — hardware must be purchased in advance, especially GPUs |
| No rate limits — use it as much as you want with no daily token cap | Requires an IT team — for updates, monitoring, and troubleshooting |
| Full fine-tuning — customize the model to fit your organization's specific data and workflows | Manual model updates — when DeepSeek releases new versions, you must update them yourself |
| No vendor lock-in — switch to a different model at any time | Distilled quality may be lower — smaller models work well but cannot match the full 671B model |
| No outage dependency — even if DeepSeek goes down, your system is unaffected | Higher electricity costs — GPU servers consume significant power (฿500-5,000/month) |
| PDPA compliant — ideal for security-focused organizations | Higher latency possible — large models on limited hardware may respond slower than the API |
Security Considerations for Self-Hosting
While self-hosting solves the problem of data being sent to China, there are still cybersecurity concerns that must be addressed:
- Network Isolation: The AI server should be placed in a separate VLAN and should not be directly accessible from the internet
- Authentication: Implement API keys or tokens for access — never expose the API publicly without authentication
- Logging: Record every request to enable audit trails — who asked what, and when
- Model Integrity: Download models only from the official Hugging Face repository and verify checksums every time
- Prompt Injection: Guard against prompt injection attacks by implementing input validation before passing data to the model
Organizations with a robust Disaster Recovery plan will be well-positioned to self-host AI with confidence, as they already have backup and recovery systems in place to handle potential issues.
Real-World Use Cases for Thai Organizations
Many Thai organizations have already begun self-hosting AI, particularly in industries where data confidentiality is paramount:
- Private hospitals: Using the 7B model to summarize medical records internally — patient data never leaves the hospital network
- Law firms: Using the 32B model to analyze contracts and search through hundreds of pages of legal documents
- Manufacturing companies: Using AI to analyze production line data, reduce waste, and minimize machine downtime — production data should never be sent externally
- Government agencies: Government data cannot be sent to China — self-hosting is the only viable option for using AI
- Financial institutions: Using AI for risk analysis while keeping financial data internal in compliance with Bank of Thailand regulations
For organizations already running an ERP system, self-hosted AI can be integrated with ERP to assist with data analysis, report generation, and answering employee queries about system data. This is precisely the direction the AI industry is heading in 2026.
Summary of Recommendations — How to Choose?
- Limited budget + testing first → Start with Ollama + 7B model on a Mac/PC you already own — completely free, the only investment is your time
- Production use for 10-30 users → Invest in an RTX 4090 + 32B model — quality is sufficient for analysis, summarization, and writing assistance
- Enterprise + Compliance → Invest in a GPU server + 671B model or use a Mac Cluster — API-equivalent quality with data staying fully internal
- No IT team → Use the ChatGPT or Claude API instead (safer than the DeepSeek API which sends data to China), as they offer clear DPAs and store data in the United States
- Highly confidential data → Self-hosting is the only truly safe option, whether with DeepSeek or any other model
DeepSeek Series — Read More
DeepSeek Series — 5 Episodes on the Chinese AI Challenger:
- EP.1: What is DeepSeek? — The Chinese AI That Shook the World
- EP.2: Mixture of Experts — The Technique That Makes It 10x Cheaper
- EP.3: Risks of Chinese AI — What Thai Organizations Must Know
- EP.4: Self-Hosting DeepSeek — Is It Worth It? What Do You Need? (this article)
- EP.5: Can DeepSeek Really Help with ERP?
Self-hosting AI is no longer a distant concept — a 40,000-baht Mac Mini can now run an AI model more capable than ChatGPT was just two years ago.
— Saeree ERP Team
