Skip to content
Stand with Ukraine flag

Architecture Overview

ThingsBoard is a multi-tenant IoT platform designed to scale horizontally. Add more instances of any service to handle more load. There is no single point of failure — the system continues operating when individual instances go down.

ThingsBoard relies on the following third-party components:

ComponentRoleRequired?
PostgreSQLEntity storage (devices, users, dashboards, rule chains)Always required
Apache KafkaMessage queue between services in cluster modeProduction clusters
Redis / ValkeyShared cache for credentials, sessions, profilesMulti-node deployments
Apache CassandraTime-series storage (Hybrid mode)High-throughput production
ZookeeperService discovery, partition assignmentCluster mode
HAProxy / nginxLoad balancer for HTTP, MQTT, CoAPCluster mode

Single-node deployments require only PostgreSQL. All other components are added as you scale.

ThingsBoard stores entities and attributes in PostgreSQL. For time-series data, you can choose between two strategies:

StrategyTime-series storageWrite throughputRecommended for
SQLPostgreSQL≤5–10K data points/secSmall deployments, simplicity
HybridCassandra~1M data points/secHigh-throughput production

See Database Layer for write batching, partitioning strategies, Cassandra tuning, and storage efficiency comparison.

ThingsBoard supports two deployment architectures:

ModeDescriptionBest for
MonolithicAll services in a single Java processDevelopment, small-to-medium workloads
MicroservicesEach service runs as a separate containerProduction, horizontal scaling, HA

Both modes can run as a single server or as a cluster. See Deployment Scenarios for sizing guidance and Performance for benchmark results.

Tenant isolation is enforced at every layer of the platform. The entity hierarchy — Tenant → Customer → Devices & Assets — determines access boundaries. Each tenant has independent rule chains, dashboards, device profiles, and user roles. System resources (database partitions, message queue topics, cache entries) are partitioned per tenant, so one tenant’s traffic spike cannot affect another.

Tenant profiles define per-tenant resource limits and quotas:

CategoryWhat it controls
Entity limitsMaximum devices, assets, dashboards, users, and other entities per tenant
API limitsRule engine executions and transport messages per second
Rate limitsPer-transport and per-device message rate limiting (see Rate Limiting below)
Data retention (TTL)Per-tenant time-to-live for telemetry, alarms, RPC, audit logs
WebSocket limitsMaximum sessions and subscriptions per tenant and customer
Isolated queuesDedicated Rule Engine Queues per tenant for performance isolation
NotificationsEmail/SMS quotas, alarm quotas
Calculated fieldsMaximum calculated fields per entity, arguments, rolling data points

ThingsBoard uses different authentication mechanisms for devices and users.

Device authentication happens on every incoming message. When a device connects via MQTT, HTTP, or CoAP, the Transport layer validates its credentials before accepting any data. Supported credential types:

Credential TypeProtocolHow it works
Access TokenMQTT, HTTP, CoAPString token sent as MQTT username or HTTP/CoAP query parameter
MQTT BasicMQTTClient ID + username + password combination
X.509 CertificateMQTT (TLS)Device presents a client certificate during TLS handshake

Credentials are cached after the first lookup — subsequent messages from the same device are validated from cache without a database query. This is critical for throughput: at 100K messages/second, each credential lookup hitting the database would overwhelm PostgreSQL.

User authentication uses JWT tokens for REST API and WebSocket connections. Users authenticate once via login endpoint, receive a JWT token pair (access + refresh), and include the access token in subsequent API requests. Token validation is stateless — the server verifies the JWT signature without a database lookup.

When a device sends data, it follows the same path regardless of deployment mode (monolithic or microservices):

  1. Connect — Device opens a connection to the Transport layer (MQTT, HTTP, CoAP, or LwM2M).
  2. Resolve tenant — Transport identifies the owning tenant from device credentials.
  3. Authenticate — Transport validates device credentials against the Core service. Invalid credentials are rejected immediately. Credentials are cached to avoid a database query on every message.
  4. Push to Rule Engine — Transport converts the protocol-specific payload into a unified message and forwards it to the Rule Engine via the message queue.
  5. Execute Rule Chains — The Rule Engine processes the message through the tenant’s configured rule chains — filtering, enriching, transforming, and triggering actions.
  6. Persist — Action nodes save telemetry, attributes, or alarms to the database.
  7. Notify — Core pushes real-time updates to WebSocket subscribers (dashboards).

The reverse data path — server to device — follows a different route. When a user or rule chain sends a command to a device:

  1. Initiate — A REST API call or rule chain action node creates an RPC request for a specific device.
  2. Route to owner — The Core service identifies which ThingsBoard node owns the target device’s actor (via consistent hashing). In a cluster, the request is forwarded via gRPC to the correct node.
  3. Device actor — The device’s actor receives the RPC request, checks whether the device has an active session, and forwards the command to the Transport layer via the message queue.
  4. Deliver — The Transport layer sends the command to the device over the device’s active connection (MQTT publish, CoAP response, etc.).
  5. Response — The device responds (optional), and the response flows back through the same path in reverse.

If the device is offline, the behavior depends on RPC type: one-way RPC is fire-and-forget, while two-way RPC waits for a response with a configurable timeout.

Dashboards and the REST API use WebSocket connections for real-time updates. When a user opens a dashboard:

  1. Connect — The browser opens a WebSocket connection to a ThingsBoard Node.
  2. Subscribe — The dashboard sends subscription commands for specific entity attributes or time-series keys (e.g., “subscribe to device ABC, key temperature”).
  3. Register — The TB Node registers the subscription in the subscription manager. In a cluster, subscriptions are tracked per-node, and subscription notifications flow via the tb_core.notifications Kafka topic.
  4. Update — When new telemetry arrives (from a device, rule chain, or API), the node processing the telemetry detects matching subscriptions and pushes the update to the subscriber’s WebSocket session.

This means real-time updates work across cluster nodes — the device’s telemetry may be processed by node A, while the dashboard is connected to node B. The notification flows through Kafka to ensure delivery.

ThingsBoard enforces rate limits at multiple levels to protect against traffic spikes and ensure fair resource sharing:

LevelWhere enforcedWhat it controls
TransportTransport layerMessages per second per device and per tenant (MQTT, HTTP, CoAP, LwM2M)
REST APICore serviceAPI calls per second per tenant and per customer
Rule EngineRule EngineRule engine messages per second per tenant
WebSocketCore serviceSubscription updates per second per session
NotificationsCore serviceEmail/SMS per hour per tenant

Rate limit configuration is set per tenant profile. When a limit is exceeded, the platform rejects additional messages with an appropriate error (e.g., MQTT CONNACK with “rate limit” reason, HTTP 429 Too Many Requests). Rate limit counters are stored in the cache (the rateLimits namespace) for fast, distributed enforcement across cluster nodes.

ThingsBoard is split into the following service groups:

ServiceResponsibilityProtocols
TransportsAccept device connections, convert protocol-specific payloads into a unified format, push messages to the rule engine. Each protocol runs in its own process.MQTT, HTTP, CoAP, LwM2M, SNMP
ThingsBoard NodeCore service — REST API, WebSocket subscriptions, Rule Engine, Actor System, device connectivity state, entity managementHTTP/REST, WebSocket
JS ExecutorExecute user-defined JavaScript from rule engine script nodes in isolated sandboxes. Deploy 20+ instances in production.Internal (Kafka)
Web UIServe the Angular-based dashboard application, proxy REST/WS to ThingsBoard NodeHTTP

Integration Executor — runs platform integrations (OPC-UA, SigFox, TheThingsNetwork, etc.), pulling data from external systems and pushing it into ThingsBoard via the rule engine. Available in Professional Edition and Cloud only.

Transport services are protocol-specific — each protocol runs in its own process (or pod in Kubernetes). See the individual API references for protocol details: MQTT API, HTTP API, CoAP API, LwM2M API.

These components work identically in both monolithic and microservices deployments:

ComponentPurpose
Actor SystemManages isolated per-entity state (device actors, rule chain actors, tenant actors). Each actor processes messages sequentially, eliminating concurrency issues.
Cache LayerCaches device credentials, entity profiles, attributes, and sessions to avoid database queries on every message. Supports Caffeine (local) and Redis (shared).
Service DiscoveryIn cluster mode, Zookeeper tracks which nodes are alive. Single-node deployments skip Zookeeper entirely.
Consistent HashingDistributes entities across cluster nodes deterministically. When a node joins or leaves, only a fraction of entities re-balance.
gRPCInter-node communication in cluster mode. Forwards messages to the node that owns the target entity.

Services communicate through a message queue. Messages are partitioned by entity ID — all messages for a single device always land on the same partition, preserving order and enabling stateful processing.

QueueUse casePersistence
Apache KafkaProduction clustersDisk-based, survives restarts
In-MemoryDevelopment, single-node setupsLost on restart

See Message Queue for the full topic topology, partitioning strategy, and producer/consumer tuning.