- 29
- April
Imagine — your organization has 100,000 paperless-signed documents, including procurement contracts, disbursement records, board meeting minutes, employee personal data (PII), and beneficiary information — all stored as PDF files in your ERP system.
The questions are: Where are they stored? In the database or the file system? Who can access them? Only through the application, or can anyone with server access open them? If someone copies all of them out — would you know? And if a hard drive is lost or a backup tape is stolen — can they be opened?
Many organizations have never asked these questions thoroughly until an incident actually happens. This article explains the 3 approaches to document storage in ERP and the trade-offs executives must understand before deciding.
Approach #1: Store PDF Files in Database
The simplest and most common approach in ERP systems — store PDF files as BLOB (Binary Large Object) in the database.
│ PostgreSQL Database │
│ ┌────────────────────────┐ │
│ │ c_edocument table │ │
│ ├────────────────────────┤ │
│ │ id | metadata │ │
│ │ name | metadata │ │
│ │ pdf_blob| ★ PDF stored │ │
│ │ | here │ │
│ └────────────────────────┘ │
└─────────────────────────────┘
Advantages:
- Good security: Users must go through the application — no direct files to copy
- Single backup set: Backup database = backup documents
- Transaction Consistency (ACID): Add/delete document and metadata in one transaction
- Complete audit log: All access goes through the app, so logs are captured
- No orphan files: Delete record = delete file, no leftover files
Disadvantages:
- Database grows very fast: PDF average 2 MB × 1 million documents = 2 TB
- Backup is slow and uses lots of space: Must back up everything every time
- Performance drops when DB is large: Slow query, difficult indexing
- HA/Replication is expensive: Must replicate everything to standby
- Database tuning gets harder: Large BLOBs affect buffer pool
Size example:
50,000 documents/year × 2 MB = 100 GB/year
10 years = 1 TB in database
Large-sized organization:
500,000 documents/year × 2 MB = 1 TB/year
10 years = 10 TB in database
Suitable for: Small to medium organizations with fewer than 100,000 documents that prioritize security.
Approach #2: Store PDF Files in File System
The most common approach in ERP systems that need to scale — store files in the file system or shared storage and only keep the path in the database.
│ PostgreSQL │ │ File System │
│ ┌────────────────────┐ │ │ /docs/ │
│ │ c_edocument │ │ │ ├─ 2026/ │
│ ├────────────────────┤ │ ──→ │ │ ├─ 04/ │
│ │ id | metadata │ │ │ │ │ ├─ AIP-001.pdf │
│ │ path | ★ file path │ │ │ │ │ ├─ AIP-002.pdf │
│ │ | here │ │ │ │ │ └─ ... │
│ └────────────────────┘ │ │ └─ ... │
└─────────────────────────┘ └──────────────────────┘
↑
Open directly
Anyone with folder
access can copy all
Advantages:
- Database is light and fast: Only stores metadata + path
- Good performance: Fast queries, small index
- Cheap storage: NAS, S3 are much cheaper than DB storage
- Streams large files well: Sending big files doesn't affect DB
- Easy to scale: Add disk, add path
- Backup can be split: Back up DB and files using different strategies
Disadvantages — and this is where the danger lies:
- ❌ Anyone with file system access can copy the entire folder
- ❌ Bypassing the app = no audit log: Opening files directly bypasses the system, which won't know
- ❌ No app-level ACL: File system permissions use a different system from user roles
- ❌ Sysadmin / DBA / outsourcer have deep access: These people can access the server
- ❌ Backup tape stolen → all readable: Files are in plaintext form
- ❌ Orphan files: Deleting DB record may leave files behind
Real Risk Scenarios
An engineer from an IT vendor comes to troubleshoot a server with root/sudo access to check the system. They can run
tar czf /tmp/all.tar.gz /docs/ in seconds. They walk out with a USB → entire document repository gone in one day.
A sysadmin who has maintained the system for years resigns. Before losing access, they copy the entire /docs/ folder. When they move to a competitor → they have all the data.
A backup tape is sent to off-site storage. It is lost during transport. Whoever finds the tape can read everything.
A server in the data center has its hard drive stolen. The disk is removed and put into another machine. Opening /docs/ folder reveals every file.
In all these scenarios — the application's audit log will see nothing, because the access happens at the OS level, not through the application.
Suitable for: Not recommended for important government documents — unless stored encrypted (see Approach #3).
Approach #3: Encrypted Object Storage (The Solution to Both Problems)
This approach combines the security of Approach #1 with the scalability of Approach #2 by adding an encryption layer.
│ User → Through app only (HTTPS + Auth) │
└────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────┐
│ Saeree ERP App (sole holder of master key) │
│ ├─ Authenticate user │
│ ├─ Check role-based permission │
│ ├─ Complete audit log │
│ └─ Decrypt + stream to browser │
└─────────┬──────────────────────────────┬───────────┘
│ │
▼ (metadata only) ▼ (encrypted blob)
┌──────────────────┐ ┌─────────────────────┐
│ PostgreSQL │ │ Object Storage │
│ - file_id │ │ /vault/ │
│ - storage_key │ │ ├─ a8f3c2.enc │
│ (random UUID) │ │ ├─ d7e9b1.enc │
│ - sha256_hash │ │ ├─ c4f5a6.enc │
│ - access_rules │ │ └─ ... │
│ - encrypted_dek │ │ │
└──────────────────┘ │ ★ AES-256 encrypted │
│ Cannot open directly│
│ Random filenames │
└─────────────────────┘
│
▼
┌─────────────────────┐
│ Key Management │
│ Service (KMS / HSM) │
│ - Master keys │
│ - Separated from data│
└─────────────────────┘
Key Principles
1. Separation of Concerns
Each component has a clear responsibility:
- Database: Stores only metadata (light and fast)
- Object Storage: Stores ciphertext (scalable)
- KMS: Stores keys (security)
If anyone gets only one component — they cannot open the documents.
2. Encryption at Application Layer
Files are encrypted before being written to storage and decrypted only at the app. Even if a sysadmin or DBA accesses storage directly — they only see ciphertext.
3. Filename Obfuscation
Filenames in storage are random UUIDs like a8f3c2d1.enc, revealing no content via the path. Anyone who sees the folder structure cannot guess which file is important.
4. Per-document Encryption Keys
Each file has its own key, so a key leak for one file ≠ all files compromised — a limited "blast radius" that contains the damage.
Combined Advantages
- ✓ Light DB like Approach #2
- ✓ Good performance like Approach #2
- ✓ Easy to scale like Approach #2
- ✓ Sysadmin / DBA cannot open directly like Approach #1
- ✓ Backup tape lost → cannot be opened (better than both)
- ✓ Direct copy detection because copying out yields unreadable files
Disadvantages:
- More complex implementation
- Must maintain a separate KMS
- Key rotation must be planned
Comparison of 3 Approaches
| Dimension | A: Database | B: File System | C: Encrypted Storage ★ |
|---|---|---|---|
| Performance | Medium | Excellent | Good |
| Database Size | Large | Light | Light |
| Storage Cost | Expensive | Cheap | Cheap |
| Bypass app access | ✗ Possible via DB | ✓ Easy | ✗ Cannot open |
| Sysadmin Risk | Medium | High | Low |
| DBA Risk | High | Low | Low |
| Backup Tape Stolen | Risky | Risky | Safe |
| Audit Log | Through app | May bypass | Through app |
| Scale 1TB+ | Difficult | Good | Good |
| Implementation | Easy | Easy | Medium |
If Someone Copies Documents Out — How Do You Know?
This is the question many executives worry about. Let's address it honestly.
Technical Truth
Digital files that are copied are harder to detect than photocopied paper because they:
- Leave no physical trace
- Have no paper watermark
- Are 100% identical in quality to the original
- Can be made in a second
But Well-Designed Systems Can Do These Things
1. Detect Abnormal Access
- Login at 3 AM → alert
- Login from foreign IP → alert
- Bulk export with no workflow reason → alert
2. Per-Access Watermark Identifying the Recipient
Every time someone downloads a PDF — the system embeds a watermark identifying:
- Viewer name
- Date and time
- IP address
- Transaction number
Result: If a PDF leaks publicly — you immediately know who leaked it.
There are two types:
- Visible watermark: Visible to the eye (overlaid on the document)
- Invisible watermark: Hidden in metadata or steganography
3. Tamper-Evident Audit Trail
Records every access in a form that cannot be retroactively modified:
- Who opened it
- When
- From which device
- IP address
- How long they viewed it / whether they downloaded it
Uses hash chain technique (like blockchain) to link logs together — anyone modifying breaks the chain and is detected immediately.
4. DRM (Digital Rights Management)
Limits what can be done with a PDF after download:
- No printing
- No copy text
- No editing
- Open for only 7 days then re-verify
- Remote revocation
5. Behavioral Monitoring + AI
The system learns each user's normal behavior and alerts when behavior deviates from normal:
- An employee who normally opens 10 files/day → opens 500 files in one day
- An executive who normally accesses from the office → accesses from mobile at night
Comparison: Paper vs. Well-Designed Paperless
Many people think paper documents are safer — let's compare.
| Threat | Paper | Well-Designed Paperless |
|---|---|---|
| Outsider physical theft | Risky | Preventable |
| Insider photocopying and sharing | Very risky | Preventable + traceable |
| Know who viewed the document | Cannot | Yes (audit log) |
| Know who took the document out | Cannot | Yes (watermark) |
| Detect modifications | Sometimes | Very well (crypto) |
| Mass copies | Hard | Easy ⚠ |
| Remote access revocation | Cannot | Yes (DRM) |
| Long-term audit retention | Cannot | 10+ years |
Observation: Paperless has "forensic capability" that paper cannot — you know what happened to every document, every time.
Truths to Accept
Before deciding, there are 3 principles to understand:
1. No system is "100% protective"
Even paper can be photocopied. Digital has screenshots and screen recording — security is about "probability," not "absolute protection."
2. The goal is not "100% prevention" but:
- Reduce attack surface
- Increase difficulty of leaking
- Increase cost for a potential leaker
- Detect and track when it happens
- Be able to prove in court who leaked
3. Paperless's edge over paper is "Forensic capability"
It's not just "preventing theft" — it's "knowing who stole" — which is a powerful deterrent.
Recommendations for Government Organizations
If Currently Using Approach #1 (DB)
Now do:
- Maintain good security level
- Plan migration before DB reaches 1 TB
Long-term:
- Consider migrating to Approach #3 when size starts impacting performance
If Using Approach #2 (File System)
Do immediately:
- ⚠ Audit file system permissions
- Encrypt backup tapes
- Restrict sysadmin privileges
- Add OS-level audit logs
Mid-term:
- Add encryption layer (Phase 1 of Approach #3)
- Effort: 2-4 weeks — no need to re-architect the system
Long-term:
- Migrate fully to Approach #3
- Add KMS, watermark, behavioral monitoring
If Selecting a New System
Ask these questions to vendors:
- Where are PDFs stored? (DB / File System / Encrypted Storage)
- If a hard drive is stolen, can the documents be opened?
- Can the vendor's sysadmin access documents directly?
- Is there a separate KMS for key management?
- Can per-access watermark be done?
- Is behavioral monitoring available?
- Are audit logs immutable?
- Are backups encrypted?
What Storage Level Should Organizations Use
| Document Level | Recommended Approach | Reason |
|---|---|---|
| Public documents (notices, bid results) | A or B | No high protection needed |
| Internal documents (general) | C (Phase 1) | encryption + audit |
| Important contracts | C (full) | + watermark + DRM |
| Board documents | C (full) | + behavioral monitor |
| Personal data (PII) | C (full) | per PDPA |
| National security data | C + HSM-backed | Highest level |
Conclusion
Choosing how to store documents in ERP isn't just a technical decision — it's a strategic decision that affects security, credibility, and the long-term sustainability of your organization for 10+ years.
— Saeree ERP Team
Organizations that choose to store files in unencrypted file systems may seem economical and convenient today — but on the day a leak happens or a backup tape goes missing, they will discover that the cost of "saving" was much more expensive than expected.
The best option is Approach #3 (Encrypted Object Storage), which provides both security and performance, and can start from Phase 1 (adding encryption to existing storage) before gradually expanding.
About Saeree ERP
Saeree ERP by Grand Linux Solution currently supports PDF storage in both database and file system depending on each organization's needs, with a roadmap to full Encrypted Object Storage in 2026 — including Key Management Service and Per-access Watermark, which represents the highest standard of document storage in ERP systems today, with continued evolution to meet upcoming standards in the future.
Related articles: Paperless Checklist: 10 Essentials for Thai Government Organizations and What Is a Digital Signature? and What Is 2FA? Why Two-Factor Authentication Matters
This article was created based on experience designing ERP systems for Thai government organizations. For consultation on document storage in your system, contact us at sale@grandlinux.com or 02-347-7730


