Self-Hosting DeepSeek

31
March

DeepSeek Series EP.4
In EP.3 of the DeepSeek Series, we explained how sending data to DeepSeek's servers in China carries significant Data Privacy risks — from China's National Intelligence Law to Thailand's PDPA. But what if you never send data out at all? What if you download DeepSeek's model and run it on your own organization's servers — where not a single byte of data ever leaves your internal network? This is the concept behind Self-hosting AI, and it is the answer for organizations that want to use open-source AI without risking data leakage.

However, self-hosting is not free — it requires hardware investment, an IT team to maintain the infrastructure, and a solid understanding of each model size's limitations. This article walks you through every aspect of self-hosting DeepSeek, from which models can run on a laptop to how many millions of baht you need for enterprise deployment, complete with step-by-step installation instructions you can follow right away.

Quick Summary — What is Self-Hosting DeepSeek?

Self-hosting DeepSeek means downloading the model and running it on your organization's own servers — data never leaves your internal network
Requires 4-512GB+ RAM depending on the model size you choose (from 1.5B to 671B parameters)
Start with DeepSeek-R1-Distill-Qwen-7B, which runs on a regular laptop, all the way up to V3 671B for enterprise use
Cost-effective for organizations with high AI usage (more than $500/month in API costs) — self-hosting pays for itself within 1-2 years
Easiest installation via Ollama — just 3 commands to get started

What is Self-Hosting? How Does It Differ from Using an API?

Before diving into the details, let us clarify the difference between "self-hosting" and "using an API," because these two approaches have vastly different pros and cons. Choosing the wrong one can cost your organization both money and time unnecessarily.

Aspect	API (Cloud)	Self-Host (On-Premise)
Data Location	DeepSeek's servers (China)	Your organization's servers
Cost	Pay per use (per token)	One-time hardware investment
Data Privacy	Data leaves your organization	100% internal
Customization	Limited — only what DeepSeek exposes	Full fine-tuning capabilities
Maintenance	No maintenance required	Requires an IT team
Uptime	Depends on DeepSeek (has gone down for 7 hours)	100% under your control

Put simply, using an API is like renting AI — convenient and maintenance-free, but your data leaves the organization. Self-hosting is like buying AI and keeping it at home — a one-time investment where data stays with you, but you are responsible for upkeep. This is the same concept as on-premise ERP systems that many organizations are already familiar with.

DeepSeek Models Available for Self-Hosting — From Laptop to Data Center

DeepSeek offers models in multiple sizes for download, ranging from tiny models that run on a regular laptop to massive models that require multi-million-baht GPU servers. This is the most important table in this article — bookmark it for reference:

Model	Size	RAM Required	GPU	Best For
R1-Distill-Qwen-1.5B	1.5B	4GB	Not required (CPU only)	Testing, learning
R1-Distill-Qwen-7B	7B	8-16GB	GPU 8GB+ or CPU	Simple chatbots, document summarization
R1-Distill-Qwen-32B	32B	24-48GB	GPU 24GB+ (RTX 4090)	Mid-level analysis tasks
R1-Distill-Llama-70B	70B	48-96GB	GPU 2x48GB (A6000)	High-quality output tasks
DeepSeek-V3 (full)	671B	350-500GB+	8xA100/H100 80GB	Enterprise, production
DeepSeek-R1 (full)	671B	350-500GB+	8xA100/H100 80GB	Advanced reasoning tasks

What is particularly noteworthy is that the 7B model, which has been distilled from the full-size model, delivers surprisingly good results — outscoring ChatGPT's GPT-3.5 (the AI standard just two years ago) on many benchmarks, despite running on a regular laptop. For a deeper dive into the technical Mixture of Experts (MoE) architecture that makes DeepSeek so resource-efficient, check out EP.2 of this series.

Real Hardware Costs — A 3-Tier Comparison

Now for the question everyone wants answered: how much do you need to invest? We have broken this down into 3 tiers based on organization size and number of users:

Tier	Model	Hardware	Approx. Cost	Best For
Starter	7B (Quantized)	Mac Mini M4 Pro 48GB or PC + RTX 4060	฿40,000-80,000	Testing, 1-5 users
Mid-Range	32B-70B	PC + RTX 4090 24GB or Mac Studio M4 Ultra 192GB	฿100,000-250,000	Department of 10-30 users
Enterprise	671B (V3/R1)	Server 8xA100 80GB or Mac Cluster	฿3,000,000-8,000,000	Entire organization, 100+ users

What stands out is the "Starter" tier — a Mac Mini M4 Pro costing just around 40,000 baht can run the 7B model smoothly, responding to queries in 1-3 seconds. This makes it ideal for organizations that want to experiment with self-hosted AI before committing to a serious investment. At the Enterprise tier, running the full 671B-parameter model may cost up to 8 million baht in hardware, but for organizations with high AI usage volume, it pays off compared to long-term API costs.

How to Self-Host DeepSeek — Step by Step

The easiest way to get started with self-hosting DeepSeek is through Ollama, an open-source tool that makes running LLMs on your own machine as easy as running Docker — just 3 commands and you are up and running:

Step 1: Install Ollama

For Linux/macOS, open your Terminal and run a single command:

curl -fsSL https://ollama.com/install.sh | sh

For macOS, you can also download directly from ollama.com or use Homebrew: brew install ollama. For Windows, download the installer from the Ollama website.

Step 2: Download the DeepSeek Model

Choose the model that matches your hardware, then run:

# 7B model — suitable for regular laptops/PCs (8GB+ RAM)
ollama pull deepseek-r1:7b

# 32B model — suitable for PCs with GPU 24GB+
ollama pull deepseek-r1:32b

# 70B model — requires GPU 48GB+ or 96GB+ RAM
ollama pull deepseek-r1:70b

Ollama automatically downloads the model in a Quantized (compressed) format, which significantly reduces the file size and RAM requirements while only marginally reducing output quality.

Step 3: Run and Start Using It

ollama run deepseek-r1:7b

That is it — you can now chat with DeepSeek running entirely on your own machine. All data stays on your device; nothing is sent externally.

Step 4: Connect via API (For Integration with Other Systems)

Ollama includes a built-in API server that is compatible with the OpenAI API format. This means applications that currently connect to the ChatGPT API can switch to Ollama with virtually no code changes:

curl http://localhost:11434/api/generate \
  -d '{"model": "deepseek-r1:7b", "prompt": "Summarize this financial report...", "stream": false}'

Beyond Ollama, there are several alternative self-hosting tools suited to different use cases:

Tool	Key Strength	Best For
Ollama	Easiest setup, install in 3 minutes	Getting started, testing, general use
vLLM	Fastest inference, high concurrent user support	Production, Enterprise
llama.cpp	Lightest weight, runs well on CPU	Machines without a GPU
Hugging Face TGI	Supports many model types, includes a dashboard	Data Science teams

Cost Comparison: Self-Host vs API — Over 1-3 Years

This is the table that decision-makers need to see — a real cost comparison between DeepSeek API and self-hosting over a 1-3 year period. The scenario assumes an organization processes approximately 10 million tokens per day (roughly 20-30 regular users):

Item	DeepSeek API	Self-Host 7B	Self-Host 70B
Hardware Cost	฿0	฿60,000 (one-time)	฿200,000 (one-time)
Monthly API/Cost	~฿3,000	฿0 (electricity ~฿500)	฿0 (electricity ~฿2,000)
1-Year Total	฿36,000	฿66,000	฿224,000
3-Year Total	฿108,000	฿78,000 ✅	฿272,000
Data Privacy	❌ Data goes to China	✅ Stays internal	✅ Stays internal

Note: The figures above are estimates for 10M tokens/day. If your organization processes higher volumes (more than 50M tokens/day), self-hosting becomes cost-effective much faster — potentially reaching ROI within 6 months, because API costs scale linearly with usage while hardware costs are fixed.

As the table shows, self-hosting the 7B model becomes cheaper than the API from year 2 onward. Although the model is smaller, it performs remarkably well for everyday tasks like document summarization, question answering, and report drafting. The 70B model has higher hardware costs, making it better suited for organizations that need premium output quality and are willing to accept a larger upfront investment.

Pros and Cons of Self-Hosting DeepSeek

Before making a decision, let us examine all the advantages and disadvantages:

✅ Pros	❌ Cons
100% Data Privacy — not a single byte leaves your organization	High upfront investment — hardware must be purchased in advance, especially GPUs
No rate limits — use it as much as you want with no daily token cap	Requires an IT team — for updates, monitoring, and troubleshooting
Full fine-tuning — customize the model to fit your organization's specific data and workflows	Manual model updates — when DeepSeek releases new versions, you must update them yourself
No vendor lock-in — switch to a different model at any time	Distilled quality may be lower — smaller models work well but cannot match the full 671B model
No outage dependency — even if DeepSeek goes down, your system is unaffected	Higher electricity costs — GPU servers consume significant power (฿500-5,000/month)
PDPA compliant — ideal for security-focused organizations	Higher latency possible — large models on limited hardware may respond slower than the API

Security Considerations for Self-Hosting

While self-hosting solves the problem of data being sent to China, there are still cybersecurity concerns that must be addressed:

Network Isolation: The AI server should be placed in a separate VLAN and should not be directly accessible from the internet
Authentication: Implement API keys or tokens for access — never expose the API publicly without authentication
Logging: Record every request to enable audit trails — who asked what, and when
Model Integrity: Download models only from the official Hugging Face repository and verify checksums every time
Prompt Injection: Guard against prompt injection attacks by implementing input validation before passing data to the model

Organizations with a robust Disaster Recovery plan will be well-positioned to self-host AI with confidence, as they already have backup and recovery systems in place to handle potential issues.

Real-World Use Cases for Thai Organizations

Many Thai organizations have already begun self-hosting AI, particularly in industries where data confidentiality is paramount:

Private hospitals: Using the 7B model to summarize medical records internally — patient data never leaves the hospital network
Law firms: Using the 32B model to analyze contracts and search through hundreds of pages of legal documents
Manufacturing companies: Using AI to analyze production line data, reduce waste, and minimize machine downtime — production data should never be sent externally
Government agencies: Government data cannot be sent to China — self-hosting is the only viable option for using AI
Financial institutions: Using AI for risk analysis while keeping financial data internal in compliance with Bank of Thailand regulations

For organizations already running an ERP system, self-hosted AI can be integrated with ERP to assist with data analysis, report generation, and answering employee queries about system data. This is precisely the direction the AI industry is heading in 2026.

Summary of Recommendations — How to Choose?

Limited budget + testing first → Start with Ollama + 7B model on a Mac/PC you already own — completely free, the only investment is your time
Production use for 10-30 users → Invest in an RTX 4090 + 32B model — quality is sufficient for analysis, summarization, and writing assistance
Enterprise + Compliance → Invest in a GPU server + 671B model or use a Mac Cluster — API-equivalent quality with data staying fully internal
No IT team → Use the ChatGPT or Claude API instead (safer than the DeepSeek API which sends data to China), as they offer clear DPAs and store data in the United States
Highly confidential data → Self-hosting is the only truly safe option, whether with DeepSeek or any other model

DeepSeek Series — Read More

DeepSeek Series — 5 Episodes on the Chinese AI Challenger:

EP.1: What is DeepSeek? — The Chinese AI That Shook the World
EP.2: Mixture of Experts — The Technique That Makes It 10x Cheaper
EP.3: Risks of Chinese AI — What Thai Organizations Must Know
EP.4: Self-Hosting DeepSeek — Is It Worth It? What Do You Need? (this article)
EP.5: Can DeepSeek Really Help with ERP?

Self-hosting AI is no longer a distant concept — a 40,000-baht Mac Mini can now run an AI model more capable than ChatGPT was just two years ago.
— Saeree ERP Team

What is Self-Hosting? How Does It Differ from Using an API?

DeepSeek Models Available for Self-Hosting — From Laptop to Data Center

Real Hardware Costs — A 3-Tier Comparison