Skip to content
Stand with Ukraine flag

Local AI with Ollama

Ollama is an open-source tool that runs Large Language Models (LLMs) locally on your own infrastructure — models like Llama 3, Mistral, or Gemma run directly on your servers instead of external cloud services.

Unlike cloud AI providers (OpenAI, Anthropic, Google Gemini), Ollama runs entirely within your environment, offering:

  • Cost reduction — no per-token charges. Once you have GPU hardware, costs are predictable regardless of usage volume.
  • Data privacy — all processing happens within your infrastructure. Telemetry data and AI analysis never leave your network, helping maintain compliance with GDPR, HIPAA, or industry-specific regulations.
  • Network independence — no dependency on internet connectivity or third-party availability, suitable for air-gapped facilities or critical infrastructure.

If ThingsBoard runs as a single service on one server, deploy Ollama on the same machine as an additional service. This works well when:

  • The server has GPU capabilities (recommended for acceptable performance).
  • Sufficient memory and CPU resources exist for both services.
  • AI workload is moderate.

Communication happens through localhost, keeping everything simple.

For Docker Compose deployments, you have two options:

OptionProsCons
Docker containerPart of your existing stack.May require extra config for GPU passthrough.
System serviceGPU support often configured automatically during install.Lives outside your container stack.

Both approaches work well. System service installation typically provides easier GPU access.

In Kubernetes environments, run Ollama on a separate node pool with GPU support:

  • Scalability — add GPU-enabled nodes as AI workload grows; Kubernetes distributes pods automatically.
  • Security — network policies, pod security standards, and ingress controllers provide fine-grained access control.
  • Complexity — requires Nvidia GPU operator, node selectors / taints, resource quotas, and solid Kubernetes expertise.

Run Ollama on completely separate infrastructure — dedicated GPU-enabled servers optimized for AI workloads. ThingsBoard makes HTTP/HTTPS requests to the remote Ollama instance. This allows independent scaling and optimization of AI and IoT workloads.

Ollama does not include built-in authentication. Without additional security layers, anyone who can reach the endpoint can use it.

Authentication is critical when:

  • Ollama is exposed to untrusted networks or the internet.
  • Multiple teams or projects share the same instance.
  • Compliance requirements mandate access controls.

Authentication may be less critical when:

  • Ollama runs within a fully trusted, isolated network.
  • Only ThingsBoard has network access to the endpoint.
  • Infrastructure already provides network-level security.

ThingsBoard supports three authentication methods when connecting to Ollama:

MethodDescriptionWhen to use
NoneUnauthenticated requests.Ollama on same server (localhost), or within an isolated network.
BasicHTTP Basic (username + password in Authorization: Basic <encoded> header).Small teams, minimal user management, HTTPS configured.
TokenBearer Token (Authorization: Bearer <token> header).Multiple teams, credential rotation, audit trails, industry standard.

For most production deployments (especially remote Ollama), Token authentication offers the best balance of security and usability.

This section demonstrates how to deploy Ollama with Nginx as a reverse proxy to add authentication. Both services run as Docker containers via Docker Compose.

Install Docker Desktop (includes Docker and Docker Compose) and ensure it is running.

Create the directory structure:

ollama-nginx-auth/
└── nginx/

All files below are created inside ollama-nginx-auth/.

This method protects the endpoint with a username and password. Nginx checks credentials against an encrypted .htpasswd file.

From the ollama-nginx-auth/ directory, create the .htpasswd file inside nginx/:

Terminal window
docker run --rm -it httpd:alpine htpasswd -nb myuser mypassword > ./nginx/.htpasswd

Create nginx/basic_auth.conf:

events {}
http {
server {
listen 80;
location / {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://ollama:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
}
}

Key settings:

  • auth_basic enables HTTP Basic Authentication.
  • auth_basic_user_file points to the password file inside the container.
  • proxy_pass forwards authenticated requests to the Ollama service.
  • Timeouts are increased to 300s to accommodate slow model responses.

Create docker-compose.basic.yml:

services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
nginx:
image: nginx:latest
container_name: nginx_proxy
ports:
- "8880:80"
volumes:
- ./nginx/basic_auth.conf:/etc/nginx/nginx.conf:ro
- ./nginx/.htpasswd:/etc/nginx/.htpasswd:ro
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:

Start the services:

Terminal window
docker compose -f docker-compose.basic.yml up -d

Pull a model (this may take some time):

Terminal window
docker exec -it ollama ollama pull gemma3:1b

Test with valid credentials:

Terminal window
curl http://localhost:8880/api/generate \
-u myuser:mypassword \
-d '{"model": "gemma3:1b", "prompt": "Why is the sky blue?", "stream": false}'

Test with incorrect credentials (should return 401 Unauthorized):

Terminal window
curl http://localhost:8880/api/generate \
-u wronguser:wrongpassword \
-d '{"model": "gemma3:1b", "prompt": "This will fail", "stream": false}'

Add a new user:

Terminal window
docker run --rm -it httpd:alpine htpasswd -nb anotheruser anotherpassword >> ./nginx/.htpasswd

To remove a user, open ./nginx/.htpasswd and delete the corresponding line. Changes take effect immediately without restarting Nginx.


Approach 2: Bearer Token (API Key) Authentication

Section titled “Approach 2: Bearer Token (API Key) Authentication”

This method uses secret tokens stored in a text file. Nginx validates tokens via a Lua script.

Create nginx/api_keys.txt:

my-secret-api-key-1
admin-key-abcdef

Create nginx/bearer_token.conf:

events {}
http {
server {
listen 80;
location / {
access_by_lua_block {
local function trim(s)
return (s:gsub("^%s*(.-)%s*$", "%1"))
end
local function get_keys_from_file(path)
local keys = {}
local file = io.open(path, "r")
if not file then
ngx.log(ngx.ERR, "cannot open api keys file: ", path)
return keys
end
for line in file:lines() do
line = trim(line)
if line ~= "" then
keys[line] = true
end
end
file:close()
return keys
end
local api_keys_file = "/etc/nginx/api_keys.txt"
local valid_keys = get_keys_from_file(api_keys_file)
local auth_header = ngx.var.http_authorization or ""
local _, _, token = string.find(auth_header, "Bearer%s+(.+)")
if not token or not valid_keys[token] then
return ngx.exit(ngx.HTTP_UNAUTHORIZED)
end
}
proxy_pass http://ollama:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
proxy_read_timeout 300s;
}
}
}

The access_by_lua_block reads valid keys from the file on every request, extracts the token from the Authorization: Bearer <token> header, and returns 401 Unauthorized if the token is missing or invalid.

Create docker-compose.bearer.yml. This uses the OpenResty image which includes the Nginx Lua module:

services:
ollama:
image: ollama/ollama
container_name: ollama
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
nginx:
image: openresty/openresty:latest
container_name: nginx_proxy
ports:
- "8880:80"
volumes:
- ./nginx/bearer_token.conf:/usr/local/openresty/nginx/conf/nginx.conf:ro
- ./nginx/api_keys.txt:/etc/nginx/api_keys.txt:ro
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:

Start the services:

Terminal window
docker compose -f docker-compose.bearer.yml up -d

Pull a model:

Terminal window
docker exec -it ollama ollama pull gemma3:1b

Test with a valid API key:

Terminal window
curl http://localhost:8880/api/generate \
-H "Authorization: Bearer my-secret-api-key-1" \
-d '{"model": "gemma3:1b", "prompt": "Explain black holes to a 5-year-old", "stream": false}'

Test with an invalid key (should return 401 Unauthorized):

Terminal window
curl http://localhost:8880/api/generate -v \
-H "Authorization: Bearer invalid-key" \
-d '{"model": "gemma3:1b", "prompt": "This will fail", "stream": false}'

Edit nginx/api_keys.txt — add, change, or remove keys (one per line). Changes take effect immediately on the next request because the Lua script reads the file on every request.


To start or stop the services:

Terminal window
# Start
docker compose -f <compose-file-name> up -d
# Stop
docker compose -f <compose-file-name> down

Replace <compose-file-name> with docker-compose.basic.yml or docker-compose.bearer.yml.

Once Ollama is deployed, connect it to ThingsBoard through the AI models configuration page.

ParameterDescription
ProviderSelect Ollama from the dropdown.
Base URLHTTP/HTTPS endpoint of your Ollama instance (e.g., http://localhost:11434, http://192.168.1.100:8880, https://ollama.yourdomain.com).
AuthenticationChoose None, Basic (username + password), or Token (API key).
Model IDThe Ollama model to use (e.g., llama3:8b, mistral:7b, gemma3:1b). Must match a model you have pulled.
Temperature, Top P, Top K, Max tokensControl the model’s response behavior. Configure according to your use case.
Context lengthTotal tokens the model can process per request (input + output).

Context length significantly impacts GPU memory usage. Unlike cloud services that scale automatically, with Ollama you manage fixed hardware resources.

Start with a reasonable estimate based on your typical input size plus expected output length, then adjust:

  • If requests are being truncated, increase context length.
  • If memory usage is too high or performance suffers, reduce it or use a smaller model.

Click Check connectivity at the bottom of the form. A green checkmark confirms that ThingsBoard can communicate with your Ollama endpoint and the specified model is available.