- 3
- April
Ollama Series EP.5 — From EP.4 where we built RAG to answer from documents, now it's time to connect Ollama to real applications! Ollama has a built-in REST API running on Port 11434 — callable from Python, JavaScript, cURL, or any language that can send HTTP requests. This article covers every API endpoint with ready-to-use code examples for Chatbots, document summarization, Automation Workflows, and Structured Output (JSON Mode).
In short — What can Ollama API do?
- REST API runs on
http://localhost:11434— callable from any language - /api/generate — Generate text from a Prompt (Single-turn)
- /api/chat — Chat with conversation history (Multi-turn)
- /api/embeddings — Convert text to Vectors (for RAG)
- Structured Output — Force AI to respond in JSON following a defined Schema
- Streaming — Receive answers word-by-word in real-time (like ChatGPT)
- Compatible with OpenAI API — Drop-in replacement for OpenAI in existing apps
Ollama API — Endpoints Overview
Ollama Server runs on http://localhost:11434 automatically (starts after installing Ollama). Key endpoints:
| Endpoint | Method | What It Does | Use When |
|---|---|---|---|
/api/generate |
POST | Generate text from Prompt | Single questions, summarize text, translate |
/api/chat |
POST | Chat with conversation history | Chatbot, AI assistant with memory |
/api/embeddings |
POST | Convert text to Vector | RAG, Semantic Search, document classification |
/api/tags |
GET | List available models | Show model dropdown in apps |
/api/show |
POST | Show model info (size, parameters) | Display model info in apps |
/api/pull |
POST | Download a model | Manage models via API |
/api/delete |
DELETE | Delete a model | Manage storage space |
/v1/chat/completions |
POST | OpenAI-compatible API | Drop-in replacement for OpenAI API in existing apps |
/api/generate — Generate text from Prompt
The most basic endpoint — send a Prompt, get an answer back:
cURL
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5",
"prompt": "Explain what ERP is in 3 sentences",
"stream": false
}'
Python
import requests
response = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5",
"prompt": "Explain what ERP is in 3 sentences",
"stream": False
})
print(response.json()["response"])
JavaScript (Node.js / Fetch)
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen2.5",
prompt: "Explain what ERP is in 3 sentences",
stream: false
})
});
const data = await response.json();
console.log(data.response);
/api/chat — Chat with conversation history
The most important endpoint for building Chatbots — because AI "remembers" previous conversation:
import requests
messages = [
{"role": "system", "content": "You are an ERP expert. Answer concisely."},
{"role": "user", "content": "What is ERP?"},
]
# Turn 1
response = requests.post("http://localhost:11434/api/chat", json={
"model": "qwen2.5",
"messages": messages,
"stream": False
})
assistant_reply = response.json()["message"]["content"]
print("AI:", assistant_reply)
# Turn 2 - AI remembers previous conversation
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "What organization size is it suitable for?"})
response = requests.post("http://localhost:11434/api/chat", json={
"model": "qwen2.5",
"messages": messages,
"stream": False
})
print("AI:", response.json()["message"]["content"])
Structured Output — Force AI to Respond in JSON
One of Ollama API's most powerful features — you can force AI to respond in JSON following your defined Schema, making results immediately usable in apps without text parsing:
import requests, json
response = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5",
"prompt": "Analyze this email: 'Requesting to reschedule the meeting from Monday to Wednesday at 14:00 in Room A301'",
"stream": False,
"format": {
"type": "object",
"properties": {
"action": {"type": "string"},
"original_date": {"type": "string"},
"new_date": {"type": "string"},
"time": {"type": "string"},
"room": {"type": "string"},
"urgency": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["action", "new_date", "time"]
}
})
result = json.loads(response.json()["response"])
print(json.dumps(result, ensure_ascii=False, indent=2))
# Output:
# {
# "action": "reschedule_meeting",
# "original_date": "Monday",
# "new_date": "Wednesday",
# "time": "14:00",
# "room": "A301",
# "urgency": "medium"
# }
What can Structured Output do?
- Extract data from emails Notes: Automatically pull dates, times, locations, contacts
- Classify documents Notes: AI reads documents and responds with JSON for category and priority level
- Generate system data Notes: Have AI create purchase orders as JSON and send directly to ERP system
- Analyze budget Notes: AI reads reports and responds with JSON showing which items are over budget and by how much
Streaming — Receive Answers in Real-time
By default, Ollama API streams answers token by token — just like ChatGPT where text appears word by word, so users don't have to wait for the complete response:
import requests, json
# Streaming mode (default)
response = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5",
"prompt": "Explain what PDPA is",
"stream": True # default
}, stream=True)
for line in response.iter_lines():
if line:
data = json.loads(line)
print(data["response"], end="", flush=True)
if data.get("done"):
print() # New line when done
OpenAI-Compatible API — Drop-in OpenAI Replacement
Ollama supports the OpenAI-compatible API at /v1/chat/completions — meaning apps written for OpenAI API can switch to Ollama by just changing the URL with no other code changes:
# Use OpenAI Python SDK with Ollama
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1", # Just change the URL
api_key="ollama" # Any value works (Ollama doesn't need an API Key)
)
response = client.chat.completions.create(
model="qwen2.5",
messages=[
{"role": "system", "content": "You are an ERP expert"},
{"role": "user", "content": "How to choose the right ERP for your organization?"}
]
)
print(response.choices[0].message.content)
Why OpenAI-Compatible?
- Apps already using OpenAI API (such as MCP, LangChain, AutoGen) can switch to Ollama immediately
- No API Key needed — save costs, no monthly fees
- Data stays local — switch from sending to OpenAI to processing on your machine
Real Example — Build a Web Chatbot
This example builds a simple Chatbot with Python + Flask connected to Ollama:
# file: chatbot.py
from flask import Flask, request, jsonify, render_template_string
import requests
app = Flask(__name__)
HTML = """
<!DOCTYPE html>
<html>
<head><title>ERP AI Assistant</title></head>
<body>
<h2>ERP AI Assistant (Powered by Ollama)</h2>
<div id="chat" style="height:400px;overflow-y:auto;border:1px solid #ccc;padding:10px;"></div>
<input id="msg" style="width:80%" placeholder="Ask about ERP...">
<button onclick="send()">Send</button>
<script>
async function send() {
const msg = document.getElementById('msg').value;
document.getElementById('chat').innerHTML += '<p><b>You:</b> ' + msg + '</p>';
document.getElementById('msg').value = '';
const res = await fetch('/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({message: msg})
});
const data = await res.json();
document.getElementById('chat').innerHTML += '<p><b>AI:</b> ' + data.reply + '</p>';
}
</script>
</body>
</html>
"""
@app.route("/")
def index():
return render_template_string(HTML)
@app.route("/chat", methods=["POST"])
def chat():
user_message = request.json["message"]
response = requests.post("http://localhost:11434/api/generate", json={
"model": "qwen2.5",
"prompt": user_message,
"system": "You are an ERP expert. Answer concisely.",
"stream": False
})
return jsonify({"reply": response.json()["response"]})
if __name__ == "__main__":
app.run(port=5000)
# Run
pip install flask requests
python chatbot.py
# Open http://localhost:5000
Connect with Automation — n8n / Make
Ollama API integrates easily with Automation tools — since it's a standard REST API, tools like n8n (Self-hosted) or Make (Cloud) can call Ollama API via HTTP Request Node:
| Workflow | Steps | Benefit |
|---|---|---|
| Summarize Email | Receive email → send content to Ollama for summary → send LINE notification | Executives read summaries instead of long emails |
| Classify Documents | Upload file → Ollama classifies → save to Database | Reduce manual document sorting |
| Translate Documents | Receive TH document → Ollama translates to EN → return via Webhook | Free translation, no Translation API costs |
| Anomaly Detection | Pull data from ERP → Ollama analyzes → alert if anomalies found | Risk management proactively |
Important Security Notes:
- Never expose Ollama API directly to the Internet — by default Ollama listens on localhost only (safe). If you need remote access, use a Reverse Proxy (nginx/Caddy) + Authentication
- No built-in Authentication Notes: Ollama API has no API Key/Token — if opening to other machines, add an Auth Layer yourself
- Rate Limiting Notes: If multiple users share the server, set Rate Limits to prevent overload
- Security details are covered in EP.6: Secure Self-Hosted AI
Saeree ERP + Ollama API Notes:
With Ollama API, organizations can connect AI directly to ERP systems — having AI summarize reports, analyze manufacturing costs or help prepare budget data. All data stays within the organization's network. Interested? Consult our team for free
Ollama Series — Read More
Ollama Series — 6 Episodes, Complete Local AI Guide Notes:
- EP.1: What Is Ollama? — Run AI on Your Own Machine
- EP.2: Install Ollama on Every OS — macOS / Windows / Linux
- EP.3: Using Ollama for Real — Choosing Models, Writing Prompts, and Creating Modelfiles
- EP.4: Ollama + RAG — Build AI That Answers from Your Documents
- EP.5: Ollama API — Connect AI to Your Apps and Enterprise Systems (this article)
- EP.6: Secure Self-Hosted AI — Security & Best Practices
"Ollama API transforms AI from a single-user Terminal tool into a service the entire organization can access — with just one HTTP Request."
- Saeree ERP Team



