02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Businesses Contact Us

Ollama + RAG

Ollama RAG Build AI That Answers from Organization Documents LangChain ChromaDB
  • 3
  • April

Ollama Series EP.4 — From EP.3 where we learned to choose models and create Modelfiles, now we take it further — building AI that actually answers questions from your organization's documents. Not just general knowledge, but from company manuals, policies, financial reports, or any documents you feed it. All powered by a technique called RAG (Retrieval-Augmented Generation) — and most importantly, all data stays on your machine, never leaving the organization.

In short — What is RAG and why use it?

  • RAG = Retrieval-Augmented Generation — a technique that makes AI "search" documents before answering
  • Regular AI answers from "general knowledge" it was trained on — it doesn't know your organization's specifics
  • RAG enables AI to answer from your organization's actual documents — manuals, policies, reports, contracts, etc.
  • Combined with Ollama = data never leaves your machine
  • Key tools: Ollama + LangChain (or LlamaIndex) + ChromaDB
  • This article includes complete code examples you can follow immediately

What is RAG? — Why Regular AI Can't Answer from Your Documents

Imagine asking ChatGPT: "According to our company's leave policy, how many days can probationary employees take?" — ChatGPT can't answer because it has never seen your company's policy. It only knows general information it was trained on.

RAG (Retrieval-Augmented Generation) solves this by adding a "search" step before AI answers:

Step Regular AI AI + RAG
1. Receive question Receive question Receive question
2. Search for info ❌ No search, answers from memory ✅ Searches relevant documents
3. Generate answer Answers from training data (may hallucinate) Answers from actual documents + cites sources
Accuracy Low (for organization-specific data) High (answers from actual documents)
Hallucination High — may fabricate information Low — has document references

The RAG technique was first proposed by Patrick Lewis and the team from Facebook AI Research (FAIR) in 2020 in a paper titled "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (arXiv:2005.11401). Today, RAG is a standard technique used by virtually every AI company — including ChatGPT, Claude, Gemini all use RAG in various forms for Knowledge Retrieval

How Does RAG Work? — 5 Steps

Now that you understand the concept, let's look at how RAG works technically:

Step What It Does Tool
1. Load Load documents (PDF, Word, TXT, CSV, HTML) LangChain Document Loaders
2. Split Split documents into small chunks (500-1,000 characters) RecursiveCharacterTextSplitter
3. Embed Convert text to Vectors (numbers) for "semantic" search Ollama Embeddings (nomic-embed-text)
4. Store Store Vectors in a Vector Database ChromaDB / FAISS / Qdrant
5. Retrieve + Generate When a question comes → find relevant Chunks → send to LLM to generate answer Ollama LLM (qwen2.5 / llama3.1)

What is Embedding? — The Heart of RAG

Embedding is the process of converting text into "coordinates in high-dimensional space" (Vectors) so computers can understand the "meaning" of text. For example, "sick leave" and "time off due to illness" would be converted into Vectors that are close together, even though they use different words.

Ollama has built-in Embedding Models. The most popular is nomic-embed-text — small (274 MB) but high quality, supports 8,192 Tokens per Chunk, and most importantly, runs entirely locally without sending any data outside.

# Download Embedding Model
ollama pull nomic-embed-text

# Test Embedding
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "Company leave policy"
}'
# Returns a 768-dimensional number array, e.g. [0.123, -0.456, 0.789, ...]

What is a Vector Database?

Vector Database is a database designed to store and search Vectors — instead of keyword search like PostgreSQL it searches by "semantic similarity". Popular choices with Ollama:

Vector DB Type Highlights Best For
ChromaDB Open-source, embeddable Easiest to install pip install chromadb Getting started, Prototype, small orgs
FAISS Library by Meta Very fast, handles millions of Vectors Large datasets, need speed
Qdrant Open-source, Docker Has REST API, complex filtering Production, medium-large orgs
pgvector PostgreSQL Extension Works with existing PostgreSQL Orgs already using PostgreSQL

Hands-on — Build RAG with Ollama + LangChain + ChromaDB

Let's build a real RAG Pipeline — this example feeds a PDF document to AI and asks questions from it:

Step 1: Install Dependencies

# Install Python packages
pip install langchain langchain-community langchain-ollama
pip install chromadb
pip install pypdf

# Download Models in Ollama
ollama pull qwen2.5
ollama pull nomic-embed-text

Step 2: Write RAG Pipeline

# file: rag_demo.py
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings, ChatOllama
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# 1. Load - Load PDF document
loader = PyPDFLoader("company-policy.pdf")
documents = loader.load()
print(f"Loaded {len(documents)} pages")

# 2. Split - Split into Chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Size of each Chunk (characters)
    chunk_overlap=200,    # Overlap 200 chars (to preserve context)
    separators=["\n\n", "\n", ".", " "]
)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} Chunks")

# 3. Embed + Store - Convert to Vector and store in ChromaDB
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Persist to disk
)
print("Vector Database created")

# 4. Create RAG Chain
llm = ChatOllama(model="qwen2.5", temperature=0.2)

prompt_template = PromptTemplate(
    template="""Based on the following information, answer the question:

Information:
{context}

Question: {question}

Answer (answer only from the provided information. If not found, say so):""",
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_kwargs={"k": 4}  # Retrieve 4 most relevant Chunks
    ),
    chain_type_kwargs={"prompt": prompt_template},
    return_source_documents=True
)

# 5. Ask a question!
result = qa_chain.invoke({
    "query": "How many sick leave days can probationary employees take?"
})

print("\n=== Answer ===")
print(result["result"])
print("\n=== Sources ===")
for doc in result["source_documents"]:
    print(f"- Page {doc.metadata.get('page', '?')}: {doc.page_content[:100]}...")

Step 3: Run

python rag_demo.py

# Example output:
# Loaded 45 pages
# Split into 128 Chunks
# Vector Database created
#
# === Answer ===
# According to company policy, Section 3, Article 3.2
# Probationary employees are entitled to no more than 15 working days of sick leave per year
# Must notify supervisor on the day of leave
# Sick leave of 3+ consecutive days requires a medical certificate
#
# === Sources ===
# - Page 12: Section 3 Leave 3.2 Sick Leave Probationary employees...

Where does data stay?

  • Original documents → on your machine
  • Embedding (Vector) → stored in ChromaDB on your machine (folder ./chroma_db)
  • LLM (qwen2.5) → runs on Ollama on your machine
  • Nothing leaves your machine throughout the entire process — suitable for sensitive data at the level of enterprise risk management

Supports Multiple Document Formats

LangChain has Document Loaders for many file types — not just PDF:

File Type Loader Requires
PDF PyPDFLoader pip install pypdf
Word (.docx) Docx2txtLoader pip install docx2txt
Excel (.xlsx) UnstructuredExcelLoader pip install unstructured openpyxl
CSV CSVLoader Included with LangChain
Text (.txt) TextLoader Included with LangChain
HTML BSHTMLLoader pip install beautifulsoup4
Entire Folder DirectoryLoader Included with LangChain

Shortcut — Use Open WebUI for RAG Without Coding

For those who don't want to write Python — Open WebUI has built-in RAG features. Just drag and drop files into the chat:

  1. Install Open WebUI as shown in EP.2
  2. Open a new Chat and select a Model (e.g., qwen2.5)
  3. Click the 📎 (attach file) icon and select a PDF, TXT, or DOCX file
  4. Open WebUI will automatically process the document into Chunks + Embeddings
  5. Ask questions — AI will answer from the attached document with source references

This method is perfect for non-developer users — such as accounting teams who want to ask AI about TFRS standards or HR teams who need to ask about company policies

RAG for ERP — Real Use Cases

For organizations already running ERP systems, RAG opens many possibilities:

Use Case Documents Fed Example Questions
Company Policy Q&A Company policy, employee handbook "How many maternity leave days?", "How is overtime calculated?"
Help with Accounting TFRS standards, Chart of Accounts "How to record fixed assets per IFRS?", "What's in chart of accounts category 5?"
Report Summarization Financial reports, budgets "Summarize over-budget expenses this quarter", "What's the trend of manufacturing costs ?"
ERP User Manual ERP manuals, SOPs "How to create a purchase order in the system?", "Monthly closing steps?"
Contract Review Employment contracts, purchase agreements "What are contract termination conditions?", "What's the penalty for late delivery?"
Preserve Institutional Knowledge Meeting notes, Best Practices "How did we resolve the stock mismatch issue?", "What was the latest ISO meeting conclusion?"

Key Techniques — Making RAG Accurate

  1. Optimal Chunk Size: Use 500-1,000 characters — too small and AI loses context, too large and AI gets too much noise. Set overlap to 100-200 characters to avoid losing data at chunk boundaries.
  2. Choose k (number of retrieved Chunks) wisely: Start with k=3-5 — too few may miss important data, too many sends noise that confuses the AI.
  3. Use good prompts: Tell AI clearly to "answer only from provided information; if not found, say so" — reduces Hallucination significantly.
  4. Clean documents first: Remove repeated headers/footers, blank pages, watermarks — clean data gives better results.
  5. Test with known answers: Start by asking questions you already know the answer to, to verify RAG retrieves correctly.

RAG Limitations You Must Know:

  • Not Fine-tuning: RAG doesn't "retrain" the AI — it just provides additional information at answer time. Bad documents = bad answers.
  • Depends on document quality: Poorly OCR'd scans, complex Excel tables may give poor results.
  • Context Window is limited: Even with retrieved Chunks, LLMs have limited Context Windows (4K-128K Tokens). Sending too much data causes errors.
  • Thai language may need better models: Some Embedding models don't handle Thai well — use nomic-embed-text or mxbai-embed-large for better results.

Saeree ERP + RAG:

Saeree ERP is developing an AI Assistant using RAG techniques to let users query ERP data with natural language — such as "What were last month's sales?" or "How many purchase orders are pending approval?" Interested? Consult our team for free

Ollama Series — Read More

"RAG transforms AI from 'knows everything but not about you' into 'an expert on your organization's documents' — and with Ollama, everything stays within your own walls."

- Saeree ERP Team

References

Interested in ERP for Your Organization?

Consult with Grand Linux Solution experts — free of charge

Request a Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.