Skip to content
Stand with Ukraine flag

3M msg/sec throughput with a single TBMQ node

In the context of MQTT brokers, optimal performance under high workloads is critical. This article explores a performance test of TBMQ demonstrating its ability to handle 3 million messages per second with single-digit millisecond latency — using a single node.

A single TBMQ node was deployed within an EKS cluster alongside 3 Kafka nodes and an RDS instance.

Publishers: 100 clients, each publishing 10 msg/sec to their own topic following the pattern CountryCode/City/ID. Message size: ~66 bytes.

Subscribers: 3,000 clients, each subscribing to CountryCode/City/# to receive all messages from all publishers. With 100 publishers × 10 msg/sec × 3,000 subscribers = 3M messages per second total throughput.

The test ran for 30 minutes to verify TBMQ can sustain this load without performance degradation or resource exhaustion.

ServiceTBMQAWS RDS (PostgreSQL)Kafka
Instance typem7a.8xlargedb.m6i.largem7a.large
vCPU3222
Memory (GiB)12888
Storage (GiB)103050
Network (Gbps)12.512.512.5
PublishersSubscribersMsg/sec/publisherTotal throughputQoSPayloadTBMQ CPUTBMQ memory
1003,000103M msg/s066 bytes54%75 GiB

Latency:

Msg latency avgMsg latency 95th
7.4 ms11 ms

At 54% CPU utilization, TBMQ demonstrates substantial remaining processing capacity, suggesting it can efficiently handle even higher workloads and message delivery peaks.

Alternative instance: tests on m7a.4xlarge (16 vCPU, 64 GiB) achieved 14.2 ms average latency at 90% CPU, highlighting TBMQ’s flexibility across different instance types.

The test agent is a cluster of performance test nodes (runners) supervised by an orchestrator. For this test, the agent consisted of 1 publisher pod, 6 subscriber pods, and 1 orchestrator pod — each on a separate AWS EC2 instance.

Subscriber clients set up their subscriptions; publishers ran a warm-up phase. Once all runners reported ready to the orchestrator, message publishing began. Monitoring was performed using JMX, htop, Kafka UI, and AWS CloudWatch.

Refer to the AWS cluster installation guide for deployment instructions.

TBMQ successfully processed 3M messages per second with an average latency of 7.4 ms on a single node, confirming its position as a scalable MQTT broker ready for demanding fan-out workloads. Follow the project on GitHub for further performance results.