Do the claims add up? — Jack Vanlightly

Apache Kafka has been the most popular open source event streaming system for many years and it continues to grow in popularity. Within the wider ecosystem there are other open source and source available competitors to Kafka such as Apache Pulsar, NATS Streaming, Redis Streams, RabbitMQ and more recently Redpanda (among others).

Redpanda is a source available Kafka clone written in C++ using the Seastar framework from ScyllaDB, a wide-column database. It uses the popular Raft consensus protocol for replication and all distributed consensus logic. Redpanda has been going to great lengths to explain that its performance is superior to Apache Kafka due to its thread-per-core architecture, use of C++, and its storage design that can push high performance NVMe drives to their limits.

They claim to be the latest stage in evolution of distributed log systems, painting a picture of Kafka and other systems as being designed for a bygone era of the spinning disk whose time has come. They make a compelling argument not only for better performance but lower Total Cost of Ownership (TCO) and their benchmarks seem to back it all up.

The interesting thing about Redpanda and Kafka is that they are both distributed, append-only log systems that typically have a replication factor of 3, are both leader-follower based and use the same client protocol. Additionally both chose to map one partition to one active segment file rather than use a shared-log storage model such as Apache Pulsar and Pravega. For the same throughput, they write the same amount of data to disk (to the same number of files) and they transmit the same amount of data over the network. So regarding the superior throughput claims, the question is, given we typically size our cloud instances based on network and disk IO capabilities, will a CPU optimized broker make a big difference on cost? Or is it that Redpanda can simply write data to disk faster than Kafka? When it comes to latency, will the thread-per-core architecture make a difference? Do concurrent data structures, based on locks, in Kafka represent a sizeable portion of the end-to-end latency? These are some of the questions I had when I began.

The Redpanda claims

According to their Redpanda vs Kafka benchmark and their Total Cost of Ownership analysis, if you have a 1 GB/s workload, you only need three i3en.6xlarge instances with Redpanda, whereas Apache Kafka needs nine and still has poor performance. They are a bold set of claims and they seem plausible. Built in C++ for modern hardware with a thread-per-core architecture sounds compelling and it seems logical that the claims must be true. But are they?

Since joining the Apache Kafka community in February 2022, I haven’t seen any interest from the community in verifying these Redpanda claims of superior performance. Even ChatGPT seems to think Redpanda has superior performance to Kafka because basically no-one has actually tested whether it’s true. So I decided to put Redpanda to the test myself, and see if it all stands up to close scrutiny. I didn’t test 9 node Kafka clusters - all tests were with 3 brokers and always on the i3en.6xlarge.

I’m a distributed systems engineer with a large amount of benchmarking experience and I’ve previously contributed to the OpenMessagingBenchmark project. I’ve spent months of my life benchmarking RabbitMQ quorum queues, RabbitMQ Streams while at VMware. While at Splunk I spent significant time benchmarking Apache BookKeeper as part of my performance and observability work as a BookKeeper committer. Now I work at Confluent, with a focus on Apache Kafka and our cloud service. So I’m pretty used to this kind of project and the great thing is that the OpenMessagingBenchmark makes it pretty easy to run.

What did I find?

I ran benchmarks against Redpanda and Kafka on identical hardware: three i3en.6xlarge, same client configs and either both with or without TLS.

I can tell you, now having wrapped up my benchmarking work, that Redpanda’s claims regarding Kafka and even their own performance are greatly exaggerated. This is probably no surprise given all this is really just benchmarketing, but as I stated before, if no-one actually tests this stuff out and writes about it, people will just start believing it. We need a reality check.

As you’ll see from this analysis, no-one should use three i3en.6xlarge for a 1 GB/s workload. Redpanda manages it with the given partition counts and client counts, but quickly doesn’t manage it when you start making minor modifications to that workload.

I started testing Redpanda back in March and have been running the same tests against both Kafka and Redpanda since then. Some notable findings are:

The 1 GB/s benchmark is not at all generalizable as Redpanda performance deteriorated significantly with small tweaks to the workload, such as running it with 50 producers instead of 4.
Redpanda performance during their 1 GB/s benchmark deteriorated significantly when run for more than 12 hours.
Redpanda end-to-end latency of their 1 GB/s benchmark increased by a large amount once the brokers reached their data retention limit and started deleting segment files. Current benchmarks are based on empty drive performance.
Redpanda struggled when producers set record keys, causing messages to be dispatched to partitions based on those keys resulting in message batches to be both smaller and more numerous.
Redpanda was unable to push the NVMe drives to their throughput limit of 2 GB/s with acks=1 but Kafka was.
Kafka was able to drain large backlogs while under constant 800 MB/s or 1 GB/s producer load but Redpanda was not. Its backlogs continued to grow or entered a stable equilibrium where the backlog drained partially but never fully.

In all the above cases, Kafka usually outperformed Redpanda to a large degree, both reaching higher throughput and achieving lower end-to-end latency, even the tail latencies - on identical hardware. In other tests I ran Redpanda outperformed Kafka (though never on throughput) - so yes Redpanda “can” be “faster” than Kafka but the reverse is also true.

I will briefly explain these findings below with data. I will also publish deeper-dive blogs that explore the results in more depth and if possible discuss potential underlying reasons behind each finding.

I hope you come away with a new appreciation that trade-offs exist, there is no free lunch despite the implementation language or algorithms used. Optimizations exist, but you can’t optimize for everything. In distributed systems you won’t find companies or projects that state that they optimized for CAP in the CAP theorem. Equally, we can’t optimize for high throughput, low latency, low cost, high availability and high durability all at the same time. As system builders we have to choose our trade-offs, that single silver-bullet architecture is still out there, we haven’t found it yet.

First, an audit of the benchmark code

I started with the OpenMessagingBenchmark repository of Redpanda, as it doesn’t exist in the official OMB repository. Before I began I did an audit of the code to make sure everything looked good. Unfortunately for Kafka it was misconfigured with a couple of issues.

Issue #1 is that in Kafka’s server.properties file has the line log.flush.interval.messages=1 which forces Kafka to fsync on each message batch. So all tests, even those where this is not configured in the workload file will get this fsync behavior. I have previously blogged about how Kafka uses recovery instead of fsync for safety. Setting this line will degrade Kafka performance and it is relatively uncommon for people to set it. So I removed that from the server.properties file. Redpanda incorrectly claim Kafka is unsafe because it doesn’t fsync - it is not true.

Issue #2 is that Java 11 is still used. To be fair, this is how it is in the upstream OMB repository and in the Confluent fork. However, Java 11 is already 5 years old and Kafka works well with Java 17. Kafka especially benefits from Java 17 with TLS. So I updated the Ansible script to install Java 17 on both the clients (Redpanda included) and Kafka server.

Issue #3 is that the Redpanda driver (the clients) is hard-coded to asynchronously commit offsets every 5 seconds. The Kafka driver on the other hand by default asynchronously commits on each poll. So I updated the producer in the Redpanda driver to match the Kafka driver, where the commit behavior is controlled by configs and ensured they matched.

I made my own copy of the repo where all these changes exist.

Critical bug fixes to OMB

Throughout my time benchmarking Kafka and Redpanda I encountered a couple of bugs with the benchmark code along the way. The worst was a bug that I encountered a few times which caused the majority of the latency data to be lost, causing high percentile results to be significantly lower. The root cause was that when the benchmark code attempts to collect histograms from all the client machines, if there is an error, it retries. The problem is that on every call to get the histogram, the original histogram returns a copy but resets itself. So by the time the second attempt occurs, the original histogram only has a few seconds of data causing bad results. I only discovered this when doing a statistical analysis on some results that looked strange. I will be submitting a fix for this in the upstream OMB repository.

Finding 1 - 500 MB/s and 1 GB/s, 4 vs 50 producers

I ran the Redpanda 1 GB/s benchmark at 6 different throughputs: 500, 600, 700, 800, 900 and 1000 MB/s. I also ran it with the original 4 producers and consumers, then with 50 producers and consumers. The result was significant performance degradation with 50 producers for Redpanda. The other noteworthy result was that Redpanda was unable to reach 1000 MB/s with TLS which conflicts with the Redpanda benchmarks.

Kafka vs Redpanda Performance - Do the claims add up?

The Redpanda claims

What did I find?

First, an audit of the benchmark code

Critical bug fixes to OMB

Finding 1 - 500 MB/s and 1 GB/s, 4 vs 50 producers

Recommend

解构伦敦，一座全球领先的游戏产业枢纽是如何炼成的

抖音生活服务「心动五一」热卖品牌榜发布！看好生意花落谁家-品玩

Debian Installer Bookworm RC 3 release

音乐 NFT 平台 anotherblock 完成 400 万欧元种子轮融资，Stride.VC 领投

将建筑物转换为现代数据中心的考虑因素

Real Multithreading is Coming to Python - Learn How You Can Use It Now

OpenAI 首席执行官将首次在美国国会作证，提议颁发 AI 开发许可

A new hope: HV Capital raises record €710M to invest in European startups

GitHub - BaptisteContreras/vector-clock: PHP implementation of Vector Clock and...

Integrating SAP Analytics Cloud with SAP Cloud Identity Access Governance

About Joyk