Architecture Overview

ThingsBoard is a multi-tenant IoT platform designed to scale horizontally. Add more instances of any service to handle more load. There is no single point of failure — the system continues operating when individual instances go down.

Technology Stack

ThingsBoard relies on the following third-party components:

Component	Role	Required?
PostgreSQL	Entity storage (devices, users, dashboards, rule chains)	Always required
Apache Kafka	Message queue between services in cluster mode	Production clusters
Redis / Valkey	Shared cache for credentials, sessions, profiles	Multi-node deployments
Apache Cassandra	Time-series storage (Hybrid mode)	High-throughput production
Zookeeper	Service discovery, partition assignment	Cluster mode
HAProxy / nginx	Load balancer for HTTP, MQTT, CoAP	Cluster mode

Single-node deployments require only PostgreSQL. All other components are added as you scale.

Database Layer

ThingsBoard stores entities and attributes in PostgreSQL. For time-series data, you can choose between two strategies:

Strategy	Time-series storage	Write throughput	Recommended for
SQL	PostgreSQL	≤5–10K data points/sec	Small deployments, simplicity
Hybrid	Cassandra	~1M data points/sec	High-throughput production

See Database Layer for write batching, partitioning strategies, Cassandra tuning, and storage efficiency comparison.

Deployment Modes

ThingsBoard supports two deployment architectures:

Mode	Description	Best for
Monolithic	All services in a single Java process	Development, small-to-medium workloads
Microservices	Each service runs as a separate container	Production, horizontal scaling, HA

Both modes can run as a single server or as a cluster. See Deployment Scenarios for sizing guidance and Performance for benchmark results.

Multi-tenancy and Tenant Isolation

Tenant isolation is enforced at every layer of the platform. The entity hierarchy — Tenant → Customer → Devices & Assets — determines access boundaries. Each tenant has independent rule chains, dashboards, device profiles, and user roles. System resources (database partitions, message queue topics, cache entries) are partitioned per tenant, so one tenant’s traffic spike cannot affect another.

Tenant profiles define per-tenant resource limits and quotas:

Category	What it controls
Entity limits	Maximum devices, assets, dashboards, users, and other entities per tenant
API limits	Rule engine executions and transport messages per second
Rate limits	Per-transport and per-device message rate limiting (see Rate Limiting below)
Data retention (TTL)	Per-tenant time-to-live for telemetry, alarms, RPC, audit logs
WebSocket limits	Maximum sessions and subscriptions per tenant and customer
Isolated queues	Dedicated Rule Engine Queues per tenant for performance isolation
Notifications	Email/SMS quotas, alarm quotas
Calculated fields	Maximum calculated fields per entity, arguments, rolling data points

Authentication

ThingsBoard uses different authentication mechanisms for devices and users.

Device authentication happens on every incoming message. When a device connects via MQTT, HTTP, or CoAP, the Transport layer validates its credentials before accepting any data. Supported credential types:

Credential Type	Protocol	How it works
Access Token	MQTT, HTTP, CoAP	String token sent as MQTT username or HTTP/CoAP query parameter
MQTT Basic	MQTT	Client ID + username + password combination
X.509 Certificate	MQTT (TLS)	Device presents a client certificate during TLS handshake

Credentials are cached after the first lookup — subsequent messages from the same device are validated from cache without a database query. This is critical for throughput: at 100K messages/second, each credential lookup hitting the database would overwhelm PostgreSQL.

User authentication uses JWT tokens for REST API and WebSocket connections. Users authenticate once via login endpoint, receive a JWT token pair (access + refresh), and include the access token in subsequent API requests. Token validation is stateless — the server verifies the JWT signature without a database lookup.

Data Flow

When a device sends data, it follows the same path regardless of deployment mode (monolithic or microservices):

Connect — Device opens a connection to the Transport layer (MQTT, HTTP, CoAP, or LwM2M).
Resolve tenant — Transport identifies the owning tenant from device credentials.
Authenticate — Transport validates device credentials against the Core service. Invalid credentials are rejected immediately. Credentials are cached to avoid a database query on every message.
Push to Rule Engine — Transport converts the protocol-specific payload into a unified message and forwards it to the Rule Engine via the message queue.
Execute Rule Chains — The Rule Engine processes the message through the tenant’s configured rule chains — filtering, enriching, transforming, and triggering actions.
Persist — Action nodes save telemetry, attributes, or alarms to the database.
Notify — Core pushes real-time updates to WebSocket subscribers (dashboards).

Server-side RPC

The reverse data path — server to device — follows a different route. When a user or rule chain sends a command to a device:

Initiate — A REST API call or rule chain action node creates an RPC request for a specific device.
Route to owner — The Core service identifies which ThingsBoard node owns the target device’s actor (via consistent hashing). In a cluster, the request is forwarded via gRPC to the correct node.
Device actor — The device’s actor receives the RPC request, checks whether the device has an active session, and forwards the command to the Transport layer via the message queue.
Deliver — The Transport layer sends the command to the device over the device’s active connection (MQTT publish, CoAP response, etc.).
Response — The device responds (optional), and the response flows back through the same path in reverse.

If the device is offline, the behavior depends on RPC type: one-way RPC is fire-and-forget, while two-way RPC waits for a response with a configurable timeout.

WebSocket Subscriptions

Dashboards and the REST API use WebSocket connections for real-time updates. When a user opens a dashboard:

Connect — The browser opens a WebSocket connection to a ThingsBoard Node.
Subscribe — The dashboard sends subscription commands for specific entity attributes or time-series keys (e.g., “subscribe to device ABC, key temperature”).
Register — The TB Node registers the subscription in the subscription manager. In a cluster, subscriptions are tracked per-node, and subscription notifications flow via the tb_core.notifications Kafka topic.
Update — When new telemetry arrives (from a device, rule chain, or API), the node processing the telemetry detects matching subscriptions and pushes the update to the subscriber’s WebSocket session.

This means real-time updates work across cluster nodes — the device’s telemetry may be processed by node A, while the dashboard is connected to node B. The notification flows through Kafka to ensure delivery.

Rate Limiting

ThingsBoard enforces rate limits at multiple levels to protect against traffic spikes and ensure fair resource sharing:

Level	Where enforced	What it controls
Transport	Transport layer	Messages per second per device and per tenant (MQTT, HTTP, CoAP, LwM2M)
REST API	Core service	API calls per second per tenant and per customer
Rule Engine	Rule Engine	Rule engine messages per second per tenant
WebSocket	Core service	Subscription updates per second per session
Notifications	Core service	Email/SMS per hour per tenant

Rate limit configuration is set per tenant profile. When a limit is exceeded, the platform rejects additional messages with an appropriate error (e.g., MQTT CONNACK with “rate limit” reason, HTTP 429 Too Many Requests). Rate limit counters are stored in the cache (the rateLimits namespace) for fast, distributed enforcement across cluster nodes.

Services

ThingsBoard is split into the following service groups:

Service	Responsibility	Protocols
Transports	Accept device connections, convert protocol-specific payloads into a unified format, push messages to the rule engine. Each protocol runs in its own process.	MQTT, HTTP, CoAP, LwM2M, SNMP
ThingsBoard Node	Core service — REST API, WebSocket subscriptions, Rule Engine, Actor System, device connectivity state, entity management	HTTP/REST, WebSocket
JS Executor	Execute user-defined JavaScript from rule engine script nodes in isolated sandboxes. Deploy 20+ instances in production.	Internal (Kafka)
Web UI	Serve the Angular-based dashboard application, proxy REST/WS to ThingsBoard Node	HTTP

Integration Executor — runs platform integrations (OPC-UA, SigFox, TheThingsNetwork, etc.), pulling data from external systems and pushing it into ThingsBoard via the rule engine. Available in Professional Edition and Cloud only.

Transport services are protocol-specific — each protocol runs in its own process (or pod in Kubernetes). See the individual API references for protocol details: MQTT API, HTTP API, CoAP API, LwM2M API.

Internal Components

These components work identically in both monolithic and microservices deployments:

Component	Purpose
Actor System	Manages isolated per-entity state (device actors, rule chain actors, tenant actors). Each actor processes messages sequentially, eliminating concurrency issues.
Cache Layer	Caches device credentials, entity profiles, attributes, and sessions to avoid database queries on every message. Supports Caffeine (local) and Redis (shared).
Service Discovery	In cluster mode, Zookeeper tracks which nodes are alive. Single-node deployments skip Zookeeper entirely.
Consistent Hashing	Distributes entities across cluster nodes deterministically. When a node joins or leaves, only a fraction of entities re-balance.
gRPC	Inter-node communication in cluster mode. Forwards messages to the node that owns the target entity.

Inter-service Communication

Services communicate through a message queue. Messages are partitioned by entity ID — all messages for a single device always land on the same partition, preserving order and enabling stateful processing.

Queue	Use case	Persistence
Apache Kafka	Production clusters	Disk-based, survives restarts
In-Memory	Development, single-node setups	Lost on restart

See Message Queue for the full topic topology, partitioning strategy, and producer/consumer tuning.