02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Organizations Contact Us

Self-Hosting DeepSeek

Self-Hosting DeepSeek On-Premise AI for Organizations
  • 31
  • March

DeepSeek Series EP.4
In EP.3 of the DeepSeek Series, we explained how sending data to DeepSeek's servers in China carries significant Data Privacy risks — from China's National Intelligence Law to Thailand's PDPA. But what if you never send data out at all? What if you download DeepSeek's model and run it on your own organization's servers — where not a single byte of data ever leaves your internal network? This is the concept behind Self-hosting AI, and it is the answer for organizations that want to use open-source AI without risking data leakage.

However, self-hosting is not free — it requires hardware investment, an IT team to maintain the infrastructure, and a solid understanding of each model size's limitations. This article walks you through every aspect of self-hosting DeepSeek, from which models can run on a laptop to how many millions of baht you need for enterprise deployment, complete with step-by-step installation instructions you can follow right away.

Quick Summary — What is Self-Hosting DeepSeek?

  • Self-hosting DeepSeek means downloading the model and running it on your organization's own servers — data never leaves your internal network
  • Requires 4-512GB+ RAM depending on the model size you choose (from 1.5B to 671B parameters)
  • Start with DeepSeek-R1-Distill-Qwen-7B, which runs on a regular laptop, all the way up to V3 671B for enterprise use
  • Cost-effective for organizations with high AI usage (more than $500/month in API costs) — self-hosting pays for itself within 1-2 years
  • Easiest installation via Ollama — just 3 commands to get started

What is Self-Hosting? How Does It Differ from Using an API?

Before diving into the details, let us clarify the difference between "self-hosting" and "using an API," because these two approaches have vastly different pros and cons. Choosing the wrong one can cost your organization both money and time unnecessarily.

Aspect API (Cloud) Self-Host (On-Premise)
Data Location DeepSeek's servers (China) Your organization's servers
Cost Pay per use (per token) One-time hardware investment
Data Privacy Data leaves your organization 100% internal
Customization Limited — only what DeepSeek exposes Full fine-tuning capabilities
Maintenance No maintenance required Requires an IT team
Uptime Depends on DeepSeek (has gone down for 7 hours) 100% under your control

Put simply, using an API is like renting AI — convenient and maintenance-free, but your data leaves the organization. Self-hosting is like buying AI and keeping it at home — a one-time investment where data stays with you, but you are responsible for upkeep. This is the same concept as on-premise ERP systems that many organizations are already familiar with.

DeepSeek Models Available for Self-Hosting — From Laptop to Data Center

DeepSeek offers models in multiple sizes for download, ranging from tiny models that run on a regular laptop to massive models that require multi-million-baht GPU servers. This is the most important table in this article — bookmark it for reference:

Model Size RAM Required GPU Best For
R1-Distill-Qwen-1.5B 1.5B 4GB Not required (CPU only) Testing, learning
R1-Distill-Qwen-7B 7B 8-16GB GPU 8GB+ or CPU Simple chatbots, document summarization
R1-Distill-Qwen-32B 32B 24-48GB GPU 24GB+ (RTX 4090) Mid-level analysis tasks
R1-Distill-Llama-70B 70B 48-96GB GPU 2x48GB (A6000) High-quality output tasks
DeepSeek-V3 (full) 671B 350-500GB+ 8xA100/H100 80GB Enterprise, production
DeepSeek-R1 (full) 671B 350-500GB+ 8xA100/H100 80GB Advanced reasoning tasks

What is particularly noteworthy is that the 7B model, which has been distilled from the full-size model, delivers surprisingly good results — outscoring ChatGPT's GPT-3.5 (the AI standard just two years ago) on many benchmarks, despite running on a regular laptop. For a deeper dive into the technical Mixture of Experts (MoE) architecture that makes DeepSeek so resource-efficient, check out EP.2 of this series.

Real Hardware Costs — A 3-Tier Comparison

Now for the question everyone wants answered: how much do you need to invest? We have broken this down into 3 tiers based on organization size and number of users:

Tier Model Hardware Approx. Cost Best For
Starter 7B (Quantized) Mac Mini M4 Pro 48GB or PC + RTX 4060 ฿40,000-80,000 Testing, 1-5 users
Mid-Range 32B-70B PC + RTX 4090 24GB or Mac Studio M4 Ultra 192GB ฿100,000-250,000 Department of 10-30 users
Enterprise 671B (V3/R1) Server 8xA100 80GB or Mac Cluster ฿3,000,000-8,000,000 Entire organization, 100+ users

What stands out is the "Starter" tier — a Mac Mini M4 Pro costing just around 40,000 baht can run the 7B model smoothly, responding to queries in 1-3 seconds. This makes it ideal for organizations that want to experiment with self-hosted AI before committing to a serious investment. At the Enterprise tier, running the full 671B-parameter model may cost up to 8 million baht in hardware, but for organizations with high AI usage volume, it pays off compared to long-term API costs.

How to Self-Host DeepSeek — Step by Step

The easiest way to get started with self-hosting DeepSeek is through Ollama, an open-source tool that makes running LLMs on your own machine as easy as running Docker — just 3 commands and you are up and running:

Step 1: Install Ollama

For Linux/macOS, open your Terminal and run a single command:

curl -fsSL https://ollama.com/install.sh | sh

For macOS, you can also download directly from ollama.com or use Homebrew: brew install ollama. For Windows, download the installer from the Ollama website.

Step 2: Download the DeepSeek Model

Choose the model that matches your hardware, then run:

# 7B model — suitable for regular laptops/PCs (8GB+ RAM)
ollama pull deepseek-r1:7b

# 32B model — suitable for PCs with GPU 24GB+
ollama pull deepseek-r1:32b

# 70B model — requires GPU 48GB+ or 96GB+ RAM
ollama pull deepseek-r1:70b

Ollama automatically downloads the model in a Quantized (compressed) format, which significantly reduces the file size and RAM requirements while only marginally reducing output quality.

Step 3: Run and Start Using It

ollama run deepseek-r1:7b

That is it — you can now chat with DeepSeek running entirely on your own machine. All data stays on your device; nothing is sent externally.

Step 4: Connect via API (For Integration with Other Systems)

Ollama includes a built-in API server that is compatible with the OpenAI API format. This means applications that currently connect to the ChatGPT API can switch to Ollama with virtually no code changes:

curl http://localhost:11434/api/generate \
  -d '{"model": "deepseek-r1:7b", "prompt": "Summarize this financial report...", "stream": false}'

Beyond Ollama, there are several alternative self-hosting tools suited to different use cases:

Tool Key Strength Best For
Ollama Easiest setup, install in 3 minutes Getting started, testing, general use
vLLM Fastest inference, high concurrent user support Production, Enterprise
llama.cpp Lightest weight, runs well on CPU Machines without a GPU
Hugging Face TGI Supports many model types, includes a dashboard Data Science teams

Cost Comparison: Self-Host vs API — Over 1-3 Years

This is the table that decision-makers need to see — a real cost comparison between DeepSeek API and self-hosting over a 1-3 year period. The scenario assumes an organization processes approximately 10 million tokens per day (roughly 20-30 regular users):

Item DeepSeek API Self-Host 7B Self-Host 70B
Hardware Cost ฿0 ฿60,000 (one-time) ฿200,000 (one-time)
Monthly API/Cost ~฿3,000 ฿0 (electricity ~฿500) ฿0 (electricity ~฿2,000)
1-Year Total ฿36,000 ฿66,000 ฿224,000
3-Year Total ฿108,000 ฿78,000 ฿272,000
Data Privacy ❌ Data goes to China ✅ Stays internal ✅ Stays internal

Note: The figures above are estimates for 10M tokens/day. If your organization processes higher volumes (more than 50M tokens/day), self-hosting becomes cost-effective much faster — potentially reaching ROI within 6 months, because API costs scale linearly with usage while hardware costs are fixed.

As the table shows, self-hosting the 7B model becomes cheaper than the API from year 2 onward. Although the model is smaller, it performs remarkably well for everyday tasks like document summarization, question answering, and report drafting. The 70B model has higher hardware costs, making it better suited for organizations that need premium output quality and are willing to accept a larger upfront investment.

Pros and Cons of Self-Hosting DeepSeek

Before making a decision, let us examine all the advantages and disadvantages:

✅ Pros ❌ Cons
100% Data Privacy — not a single byte leaves your organization High upfront investment — hardware must be purchased in advance, especially GPUs
No rate limits — use it as much as you want with no daily token cap Requires an IT team — for updates, monitoring, and troubleshooting
Full fine-tuning — customize the model to fit your organization's specific data and workflows Manual model updates — when DeepSeek releases new versions, you must update them yourself
No vendor lock-in — switch to a different model at any time Distilled quality may be lower — smaller models work well but cannot match the full 671B model
No outage dependency — even if DeepSeek goes down, your system is unaffected Higher electricity costs — GPU servers consume significant power (฿500-5,000/month)
PDPA compliant — ideal for security-focused organizations Higher latency possible — large models on limited hardware may respond slower than the API

Security Considerations for Self-Hosting

While self-hosting solves the problem of data being sent to China, there are still cybersecurity concerns that must be addressed:

  • Network Isolation: The AI server should be placed in a separate VLAN and should not be directly accessible from the internet
  • Authentication: Implement API keys or tokens for access — never expose the API publicly without authentication
  • Logging: Record every request to enable audit trails — who asked what, and when
  • Model Integrity: Download models only from the official Hugging Face repository and verify checksums every time
  • Prompt Injection: Guard against prompt injection attacks by implementing input validation before passing data to the model

Organizations with a robust Disaster Recovery plan will be well-positioned to self-host AI with confidence, as they already have backup and recovery systems in place to handle potential issues.

Real-World Use Cases for Thai Organizations

Many Thai organizations have already begun self-hosting AI, particularly in industries where data confidentiality is paramount:

  • Private hospitals: Using the 7B model to summarize medical records internally — patient data never leaves the hospital network
  • Law firms: Using the 32B model to analyze contracts and search through hundreds of pages of legal documents
  • Manufacturing companies: Using AI to analyze production line data, reduce waste, and minimize machine downtime — production data should never be sent externally
  • Government agencies: Government data cannot be sent to China — self-hosting is the only viable option for using AI
  • Financial institutions: Using AI for risk analysis while keeping financial data internal in compliance with Bank of Thailand regulations

For organizations already running an ERP system, self-hosted AI can be integrated with ERP to assist with data analysis, report generation, and answering employee queries about system data. This is precisely the direction the AI industry is heading in 2026.

Summary of Recommendations — How to Choose?

  • Limited budget + testing first → Start with Ollama + 7B model on a Mac/PC you already own — completely free, the only investment is your time
  • Production use for 10-30 users → Invest in an RTX 4090 + 32B model — quality is sufficient for analysis, summarization, and writing assistance
  • Enterprise + Compliance → Invest in a GPU server + 671B model or use a Mac Cluster — API-equivalent quality with data staying fully internal
  • No IT team → Use the ChatGPT or Claude API instead (safer than the DeepSeek API which sends data to China), as they offer clear DPAs and store data in the United States
  • Highly confidential data → Self-hosting is the only truly safe option, whether with DeepSeek or any other model

DeepSeek Series — Read More

DeepSeek Series — 5 Episodes on the Chinese AI Challenger:

Self-hosting AI is no longer a distant concept — a 40,000-baht Mac Mini can now run an AI model more capable than ChatGPT was just two years ago.

— Saeree ERP Team

References

Interested in ERP for your organization?

Consult with our expert team at Grand Linux Solution — free of charge

Request Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.