Ollama API

3
April

Ollama Series EP.5 — From EP.4 where we built RAG to answer from documents, now it's time to connect Ollama to real applications! Ollama has a built-in REST API running on Port 11434 — callable from Python, JavaScript, cURL, or any language that can send HTTP requests. This article covers every API endpoint with ready-to-use code examples for Chatbots, document summarization, Automation Workflows, and Structured Output (JSON Mode).

In short — What can Ollama API do?

REST API runs on http://localhost:11434 — callable from any language
/api/generate — Generate text from a Prompt (Single-turn)
/api/chat — Chat with conversation history (Multi-turn)
/api/embeddings — Convert text to Vectors (for RAG)
Structured Output — Force AI to respond in JSON following a defined Schema
Streaming — Receive answers word-by-word in real-time (like ChatGPT)
Compatible with OpenAI API — Drop-in replacement for OpenAI in existing apps

Ollama API — Endpoints Overview

Ollama Server runs on http://localhost:11434 automatically (starts after installing Ollama). Key endpoints:

Endpoint	Method	What It Does	Use When
`/api/generate`	POST	Generate text from Prompt	Single questions, summarize text, translate
`/api/chat`	POST	Chat with conversation history	Chatbot, AI assistant with memory
`/api/embeddings`	POST	Convert text to Vector	RAG, Semantic Search, document classification
`/api/tags`	GET	List available models	Show model dropdown in apps
`/api/show`	POST	Show model info (size, parameters)	Display model info in apps
`/api/pull`	POST	Download a model	Manage models via API
`/api/delete`	DELETE	Delete a model	Manage storage space
`/v1/chat/completions`	POST	OpenAI-compatible API	Drop-in replacement for OpenAI API in existing apps

/api/generate — Generate text from Prompt

The most basic endpoint — send a Prompt, get an answer back:

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5",
  "prompt": "Explain what ERP is in 3 sentences",
  "stream": false
}'

Python

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Explain what ERP is in 3 sentences",
    "stream": False
})

print(response.json()["response"])

JavaScript (Node.js / Fetch)

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5",
    prompt: "Explain what ERP is in 3 sentences",
    stream: false
  })
});

const data = await response.json();
console.log(data.response);

/api/chat — Chat with conversation history

The most important endpoint for building Chatbots — because AI "remembers" previous conversation:

import requests

messages = [
    {"role": "system", "content": "You are an ERP expert. Answer concisely."},
    {"role": "user", "content": "What is ERP?"},
]

# Turn 1
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "qwen2.5",
    "messages": messages,
    "stream": False
})

assistant_reply = response.json()["message"]["content"]
print("AI:", assistant_reply)

# Turn 2 - AI remembers previous conversation
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "What organization size is it suitable for?"})

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "qwen2.5",
    "messages": messages,
    "stream": False
})

print("AI:", response.json()["message"]["content"])

Structured Output — Force AI to Respond in JSON

One of Ollama API's most powerful features — you can force AI to respond in JSON following your defined Schema, making results immediately usable in apps without text parsing:

import requests, json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Analyze this email: 'Requesting to reschedule the meeting from Monday to Wednesday at 14:00 in Room A301'",
    "stream": False,
    "format": {
        "type": "object",
        "properties": {
            "action": {"type": "string"},
            "original_date": {"type": "string"},
            "new_date": {"type": "string"},
            "time": {"type": "string"},
            "room": {"type": "string"},
            "urgency": {"type": "string", "enum": ["low", "medium", "high"]}
        },
        "required": ["action", "new_date", "time"]
    }
})

result = json.loads(response.json()["response"])
print(json.dumps(result, ensure_ascii=False, indent=2))

# Output:
# {
#   "action": "reschedule_meeting",
#   "original_date": "Monday",
#   "new_date": "Wednesday",
#   "time": "14:00",
#   "room": "A301",
#   "urgency": "medium"
# }

What can Structured Output do?

Extract data from emails Notes: Automatically pull dates, times, locations, contacts
Classify documents Notes: AI reads documents and responds with JSON for category and priority level
Generate system data Notes: Have AI create purchase orders as JSON and send directly to ERP system
Analyze budget Notes: AI reads reports and responds with JSON showing which items are over budget and by how much

Streaming — Receive Answers in Real-time

By default, Ollama API streams answers token by token — just like ChatGPT where text appears word by word, so users don't have to wait for the complete response:

import requests, json

# Streaming mode (default)
response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Explain what PDPA is",
    "stream": True  # default
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line)
        print(data["response"], end="", flush=True)
        if data.get("done"):
            print()  # New line when done

OpenAI-Compatible API — Drop-in OpenAI Replacement

Ollama supports the OpenAI-compatible API at /v1/chat/completions — meaning apps written for OpenAI API can switch to Ollama by just changing the URL with no other code changes:

# Use OpenAI Python SDK with Ollama
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # Just change the URL
    api_key="ollama"  # Any value works (Ollama doesn't need an API Key)
)

response = client.chat.completions.create(
    model="qwen2.5",
    messages=[
        {"role": "system", "content": "You are an ERP expert"},
        {"role": "user", "content": "How to choose the right ERP for your organization?"}
    ]
)

print(response.choices[0].message.content)

Why OpenAI-Compatible?

Apps already using OpenAI API (such as MCP, LangChain, AutoGen) can switch to Ollama immediately
No API Key needed — save costs, no monthly fees
Data stays local — switch from sending to OpenAI to processing on your machine

Real Example — Build a Web Chatbot

This example builds a simple Chatbot with Python + Flask connected to Ollama:

# file: chatbot.py
from flask import Flask, request, jsonify, render_template_string
import requests

app = Flask(__name__)

HTML = """
<!DOCTYPE html>
<html>
<head><title>ERP AI Assistant</title></head>
<body>
<h2>ERP AI Assistant (Powered by Ollama)</h2>
<div id="chat" style="height:400px;overflow-y:auto;border:1px solid #ccc;padding:10px;"></div>
<input id="msg" style="width:80%" placeholder="Ask about ERP...">
<button onclick="send()">Send</button>
<script>
async function send() {
  const msg = document.getElementById('msg').value;
  document.getElementById('chat').innerHTML += '<p><b>You:</b> ' + msg + '</p>';
  document.getElementById('msg').value = '';
  const res = await fetch('/chat', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({message: msg})
  });
  const data = await res.json();
  document.getElementById('chat').innerHTML += '<p><b>AI:</b> ' + data.reply + '</p>';
}
</script>
</body>
</html>
"""

@app.route("/")
def index():
    return render_template_string(HTML)

@app.route("/chat", methods=["POST"])
def chat():
    user_message = request.json["message"]
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "qwen2.5",
        "prompt": user_message,
        "system": "You are an ERP expert. Answer concisely.",
        "stream": False
    })
    return jsonify({"reply": response.json()["response"]})

if __name__ == "__main__":
    app.run(port=5000)

# Run
pip install flask requests
python chatbot.py
# Open http://localhost:5000

Connect with Automation — n8n / Make

Ollama API integrates easily with Automation tools — since it's a standard REST API, tools like n8n (Self-hosted) or Make (Cloud) can call Ollama API via HTTP Request Node:

Workflow	Steps	Benefit
Summarize Email	Receive email → send content to Ollama for summary → send LINE notification	Executives read summaries instead of long emails
Classify Documents	Upload file → Ollama classifies → save to Database	Reduce manual document sorting
Translate Documents	Receive TH document → Ollama translates to EN → return via Webhook	Free translation, no Translation API costs
Anomaly Detection	Pull data from ERP → Ollama analyzes → alert if anomalies found	Risk management proactively

Important Security Notes:

Never expose Ollama API directly to the Internet — by default Ollama listens on localhost only (safe). If you need remote access, use a Reverse Proxy (nginx/Caddy) + Authentication
No built-in Authentication Notes: Ollama API has no API Key/Token — if opening to other machines, add an Auth Layer yourself
Rate Limiting Notes: If multiple users share the server, set Rate Limits to prevent overload
Security details are covered in EP.6: Secure Self-Hosted AI

Saeree ERP + Ollama API Notes:

With Ollama API, organizations can connect AI directly to ERP systems — having AI summarize reports, analyze manufacturing costs or help prepare budget data. All data stays within the organization's network. Interested? Consult our team for free

Ollama Series — Read More

Ollama Series — 6 Episodes, Complete Local AI Guide Notes:

EP.1: What Is Ollama? — Run AI on Your Own Machine
EP.2: Install Ollama on Every OS — macOS / Windows / Linux
EP.3: Using Ollama for Real — Choosing Models, Writing Prompts, and Creating Modelfiles
EP.4: Ollama + RAG — Build AI That Answers from Your Documents
EP.5: Ollama API — Connect AI to Your Apps and Enterprise Systems (this article)
EP.6: Secure Self-Hosted AI — Security & Best Practices

"Ollama API transforms AI from a single-user Terminal tool into a service the entire organization can access — with just one HTTP Request."
- Saeree ERP Team

Ollama API — Endpoints Overview

/api/generate — Generate text from Prompt

cURL

Python

JavaScript (Node.js / Fetch)

/api/chat — Chat with conversation history

Structured Output — Force AI to Respond in JSON

Streaming — Receive Answers in Real-time

OpenAI-Compatible API — Drop-in OpenAI Replacement

Real Example — Build a Web Chatbot

Connect with Automation — n8n / Make

Ollama Series — Read More

References

Bringing AI into your business — self-hosted or managed?

About the Author

Paitoon Butri

About Saeree ERP

Solutions

Resources

Contact Us

Ollama API

Ollama API — Endpoints Overview

/api/generate — Generate text from Prompt

cURL

Python

JavaScript (Node.js / Fetch)

/api/chat — Chat with conversation history

Structured Output — Force AI to Respond in JSON

Streaming — Receive Answers in Real-time

OpenAI-Compatible API — Drop-in OpenAI Replacement

Real Example — Build a Web Chatbot

Connect with Automation — n8n / Make

Ollama Series — Read More

References

Bringing AI into your business — self-hosted or managed?

About the Author

Paitoon Butri

Ollama API — Connect AI to Your Apps and Enterprise Systems

Ollama + RAG — Build AI That Answers from Your Documents

Using Ollama for Real — Choosing Models, Writing Prompts

Don't Miss Our Updates

About Saeree ERP

Solutions

Resources

Contact Us