02-347-7730  |  Saeree ERP - Complete ERP Solution for Thai Businesses Contact Us

Ollama API

Ollama API Connect AI to Apps and Enterprise Systems REST API Python JavaScript
  • 3
  • April

Ollama Series EP.5 — From EP.4 where we built RAG to answer from documents, now it's time to connect Ollama to real applications! Ollama has a built-in REST API running on Port 11434 — callable from Python, JavaScript, cURL, or any language that can send HTTP requests. This article covers every API endpoint with ready-to-use code examples for Chatbots, document summarization, Automation Workflows, and Structured Output (JSON Mode).

In short — What can Ollama API do?

  • REST API runs on http://localhost:11434 — callable from any language
  • /api/generate — Generate text from a Prompt (Single-turn)
  • /api/chat — Chat with conversation history (Multi-turn)
  • /api/embeddings — Convert text to Vectors (for RAG)
  • Structured Output — Force AI to respond in JSON following a defined Schema
  • Streaming — Receive answers word-by-word in real-time (like ChatGPT)
  • Compatible with OpenAI API — Drop-in replacement for OpenAI in existing apps

Ollama API — Endpoints Overview

Ollama Server runs on http://localhost:11434 automatically (starts after installing Ollama). Key endpoints:

Endpoint Method What It Does Use When
/api/generate POST Generate text from Prompt Single questions, summarize text, translate
/api/chat POST Chat with conversation history Chatbot, AI assistant with memory
/api/embeddings POST Convert text to Vector RAG, Semantic Search, document classification
/api/tags GET List available models Show model dropdown in apps
/api/show POST Show model info (size, parameters) Display model info in apps
/api/pull POST Download a model Manage models via API
/api/delete DELETE Delete a model Manage storage space
/v1/chat/completions POST OpenAI-compatible API Drop-in replacement for OpenAI API in existing apps

/api/generate — Generate text from Prompt

The most basic endpoint — send a Prompt, get an answer back:

cURL

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5",
  "prompt": "Explain what ERP is in 3 sentences",
  "stream": false
}'

Python

import requests

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Explain what ERP is in 3 sentences",
    "stream": False
})

print(response.json()["response"])

JavaScript (Node.js / Fetch)

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "qwen2.5",
    prompt: "Explain what ERP is in 3 sentences",
    stream: false
  })
});

const data = await response.json();
console.log(data.response);

/api/chat — Chat with conversation history

The most important endpoint for building Chatbots — because AI "remembers" previous conversation:

import requests

messages = [
    {"role": "system", "content": "You are an ERP expert. Answer concisely."},
    {"role": "user", "content": "What is ERP?"},
]

# Turn 1
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "qwen2.5",
    "messages": messages,
    "stream": False
})

assistant_reply = response.json()["message"]["content"]
print("AI:", assistant_reply)

# Turn 2 - AI remembers previous conversation
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "What organization size is it suitable for?"})

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "qwen2.5",
    "messages": messages,
    "stream": False
})

print("AI:", response.json()["message"]["content"])

Structured Output — Force AI to Respond in JSON

One of Ollama API's most powerful features — you can force AI to respond in JSON following your defined Schema, making results immediately usable in apps without text parsing:

import requests, json

response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Analyze this email: 'Requesting to reschedule the meeting from Monday to Wednesday at 14:00 in Room A301'",
    "stream": False,
    "format": {
        "type": "object",
        "properties": {
            "action": {"type": "string"},
            "original_date": {"type": "string"},
            "new_date": {"type": "string"},
            "time": {"type": "string"},
            "room": {"type": "string"},
            "urgency": {"type": "string", "enum": ["low", "medium", "high"]}
        },
        "required": ["action", "new_date", "time"]
    }
})

result = json.loads(response.json()["response"])
print(json.dumps(result, ensure_ascii=False, indent=2))

# Output:
# {
#   "action": "reschedule_meeting",
#   "original_date": "Monday",
#   "new_date": "Wednesday",
#   "time": "14:00",
#   "room": "A301",
#   "urgency": "medium"
# }

What can Structured Output do?

  • Extract data from emails Notes: Automatically pull dates, times, locations, contacts
  • Classify documents Notes: AI reads documents and responds with JSON for category and priority level
  • Generate system data Notes: Have AI create purchase orders as JSON and send directly to ERP system
  • Analyze budget Notes: AI reads reports and responds with JSON showing which items are over budget and by how much

Streaming — Receive Answers in Real-time

By default, Ollama API streams answers token by token — just like ChatGPT where text appears word by word, so users don't have to wait for the complete response:

import requests, json

# Streaming mode (default)
response = requests.post("http://localhost:11434/api/generate", json={
    "model": "qwen2.5",
    "prompt": "Explain what PDPA is",
    "stream": True  # default
}, stream=True)

for line in response.iter_lines():
    if line:
        data = json.loads(line)
        print(data["response"], end="", flush=True)
        if data.get("done"):
            print()  # New line when done

OpenAI-Compatible API — Drop-in OpenAI Replacement

Ollama supports the OpenAI-compatible API at /v1/chat/completions — meaning apps written for OpenAI API can switch to Ollama by just changing the URL with no other code changes:

# Use OpenAI Python SDK with Ollama
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",  # Just change the URL
    api_key="ollama"  # Any value works (Ollama doesn't need an API Key)
)

response = client.chat.completions.create(
    model="qwen2.5",
    messages=[
        {"role": "system", "content": "You are an ERP expert"},
        {"role": "user", "content": "How to choose the right ERP for your organization?"}
    ]
)

print(response.choices[0].message.content)

Why OpenAI-Compatible?

  • Apps already using OpenAI API (such as MCP, LangChain, AutoGen) can switch to Ollama immediately
  • No API Key needed — save costs, no monthly fees
  • Data stays local — switch from sending to OpenAI to processing on your machine

Real Example — Build a Web Chatbot

This example builds a simple Chatbot with Python + Flask connected to Ollama:

# file: chatbot.py
from flask import Flask, request, jsonify, render_template_string
import requests

app = Flask(__name__)

HTML = """
<!DOCTYPE html>
<html>
<head><title>ERP AI Assistant</title></head>
<body>
<h2>ERP AI Assistant (Powered by Ollama)</h2>
<div id="chat" style="height:400px;overflow-y:auto;border:1px solid #ccc;padding:10px;"></div>
<input id="msg" style="width:80%" placeholder="Ask about ERP...">
<button onclick="send()">Send</button>
<script>
async function send() {
  const msg = document.getElementById('msg').value;
  document.getElementById('chat').innerHTML += '<p><b>You:</b> ' + msg + '</p>';
  document.getElementById('msg').value = '';
  const res = await fetch('/chat', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({message: msg})
  });
  const data = await res.json();
  document.getElementById('chat').innerHTML += '<p><b>AI:</b> ' + data.reply + '</p>';
}
</script>
</body>
</html>
"""

@app.route("/")
def index():
    return render_template_string(HTML)

@app.route("/chat", methods=["POST"])
def chat():
    user_message = request.json["message"]
    response = requests.post("http://localhost:11434/api/generate", json={
        "model": "qwen2.5",
        "prompt": user_message,
        "system": "You are an ERP expert. Answer concisely.",
        "stream": False
    })
    return jsonify({"reply": response.json()["response"]})

if __name__ == "__main__":
    app.run(port=5000)
# Run
pip install flask requests
python chatbot.py
# Open http://localhost:5000

Connect with Automation — n8n / Make

Ollama API integrates easily with Automation tools — since it's a standard REST API, tools like n8n (Self-hosted) or Make (Cloud) can call Ollama API via HTTP Request Node:

Workflow Steps Benefit
Summarize Email Receive email → send content to Ollama for summary → send LINE notification Executives read summaries instead of long emails
Classify Documents Upload file → Ollama classifies → save to Database Reduce manual document sorting
Translate Documents Receive TH document → Ollama translates to EN → return via Webhook Free translation, no Translation API costs
Anomaly Detection Pull data from ERP → Ollama analyzes → alert if anomalies found Risk management proactively

Important Security Notes:

  • Never expose Ollama API directly to the Internet — by default Ollama listens on localhost only (safe). If you need remote access, use a Reverse Proxy (nginx/Caddy) + Authentication
  • No built-in Authentication Notes: Ollama API has no API Key/Token — if opening to other machines, add an Auth Layer yourself
  • Rate Limiting Notes: If multiple users share the server, set Rate Limits to prevent overload
  • Security details are covered in EP.6: Secure Self-Hosted AI

Saeree ERP + Ollama API Notes:

With Ollama API, organizations can connect AI directly to ERP systems — having AI summarize reports, analyze manufacturing costs or help prepare budget data. All data stays within the organization's network. Interested? Consult our team for free

Ollama Series — Read More

"Ollama API transforms AI from a single-user Terminal tool into a service the entire organization can access — with just one HTTP Request."

- Saeree ERP Team

References

Interested in ERP for Your Organization?

Consult with Grand Linux Solution experts — free of charge

Request a Free Demo

Call 02-347-7730 | sale@grandlinux.com

Saeree ERP Author

About the Author

Paitoon Butri

Network & Server Security Specialist, Grand Linux Solution Co., Ltd.