2

Apache Cassandra 4.1: "Making Memtables pluggable opens up for some interes...

 1 year ago
source link: https://devm.io/databases/apache-cassandra-4-1-nosql
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Interview with Mick Semb Wever, Apache Cassandra PMC Chair

Apache Cassandra 4.1: "Making Memtables pluggable opens up for some interesting and novel ideas."

14. Dec 2022


We spoke with Mick Semb Wever, Apache Cassandra PMC Chair, to learn all about the newest release from Apache Cassandra, an open source NoSQL database. Mick spoke about what Apache Cassandra excels in, who is using it, what the new update adds, and helps explain concepts such as lightweight transactions.

devmio: Thank you for taking the time to answer all of our questions. For newcomers, could you please explain what Apache Cassandra is?*

Mick Semb Wever: Apache Cassandra is an open source, highly performant, distributed NoSQL database, charting a path to a more cloud native future and enabling an expanded ecosystem. We’ve been around for a long time. Apache Cassandra is trusted by thousands of companies including large corporations and small startups. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it an excellent platform for mission-critical data.

devmio: What does Apache Cassandra excel in, and what is it best used for?

Mick Semb Wever: Apache Cassandra is best used for managing and processing large data. The masterless architecture and low latency means Cassandra can withstand catastrophic outages with no data loss. Therefore, Apache Cassandra is best suited for applications that can’t afford to lose data.

Just to name-drop a little, Apache Cassandra is used in organizations of all sizes including Apple, Backblaze, Bloomberg Engineering, Home Depot, Netflix, Target, Yelp, and thousands of other companies that have large, active data sets. Check out some of our case studies here.

Apache Cassandra is best suited for applications that can’t afford to lose data.

devmio: The 4.1 release of Apache Cassandra is here! What is the new Memtable API and what does it provide?

Mick Semb Wever: Memtables are where writes first sit in-memory in Cassandra, backed by the commitlog for durability, before being flushed to disk as SSTables.

Memtables have been up until now btree implementations, with options to move objects and indexes off-heap. Memtables being (containing) long-lived objects in the JVM has always been a challenging aspect for Cassandra, where long tail latencies often correlate to GC pauses.

Making Memtables pluggable opens up for some interesting and novel ideas, like using persistent memory, something requested and worked on by Intel.

In 5.0 we see a new implementation already in trunk based on Tries. Doing this with Tries in other databases has been tried and not been successful, the implementation is Cassandra includes a number of novel solutions. Benchmarking already demonstrates a doubling of write throughput (and halving of latencies). This implementation is intended to replace the existing, but with the pluggability in place it means users will always be able to change it if they find themselves in an edge case where an alternative or older implementation is still is the right choice for them.

devmio: What are lightweight transactions (LWT)?

Mick Semb Wever: PAXOS.

Paxos has been a long-established consensus protocol and was adopted by Cassandra in 2013 for what was called “lightweight transactions.” Lightweight because it ensures that a single partition data change is isolated in a transaction, but more than one table or partition is not an option. In addition, Paxos requires multiple round trips to gain a consensus, which creates a lot of extra latency and fine print about when to use lightweight transactions in your application.

More technical details about how the implementation extends classic Paxos can be read here and here.

Paxos has been a long-established consensus protocol and was adopted by Cassandra in 2013 for what was called “lightweight transactions.” Lightweight because it ensures that a single partition data change is isolated in a transaction, but more than one table or partition is not an option.

devmio: How does the new Guardrails framework work and what are its use cases?

Mick Semb Wever: This release sees new tools to assist operators in managing risk, controlling user access, and maintaining performance. A good example of this type of feature is the new Guardrails Framework. The Guardrails Framework enforces soft and hard limits system-wide, disables certain features, and disallows specific values. The framework exists to help operators avoid particular configuration and usage pitfalls that can degrade the performance and availability of an Apache Cassandra cluster when taken to scale. As well as activating available guardrails, developers can use the framework to create their own guardrails.

devmio: What performance improvements can we expect from the newest release?

Mick Semb Wever: Half the number of network hops for all LWT (Paxos) requests. This can be hundreds of ms for cross-dc LWTs. This is the most significant by far. Not many releases can offer a halving of network traffic and therefore latency for core features in a database.

The new version of Paxos is particularly more performant in WAN settings, and under contention.

devmio: Are there any other notable changes in Apache Cassandra 4.1 that you would like to highlight?

Mick Semb Wever:

  • Paxos v2
  • Guardrails
  • CQL (Native Protocol) Rate Limiting
  • Client-side Password Hashing
  • Partition Denylist
  • Pluggability: Memtable Encryption, Authentication

This release sees new tools to assist operators in managing risk, controlling user access, and maintaining performance. A good example of this type of feature is the new Guardrails Framework.

devmio: Not to get ahead of ourselves, but could you share any upcoming plans for the next release? Is there anything planned for the roadmap ahead that users should anticipate?

Mick Semb Wever:

  • Accord ACID Transactions
  • Trie Memtables
  • Trie IndexeD SSTables
  • JDK 17

devmio: How can people contribute to the open source Apache Cassandra community, or keep an eye on future developments?

Mick Semb Wever: There are many ways to get involved! We strive to be open, empathetic, welcoming, friendly, and patient.

For broad, opinion-based questions, general discussions, ask how to get help, or receive announcements, please subscribe to the user mailing list: [email protected]

To meet other users and developers, participate in general discussions and get involved with the project, join our Slack channel: https://infra.apache.org/slack.html

For contributor discussions related to the development of the Cassandra project, join the Developer Mailing List: [email protected]

To participate and join the following Slack channels, first sign up for a Slack account here: https://s.apache.org/slack-invite #CASSANDRA-DEV is strictly for questions or discussions related to Cassandra development. #CASSANDRA-BUILDS is for results of automated test builds. #CASSANDRA-BUILDS-PATCHES is for results of patch test builds.

Full information is available here: https://cassandra.apache.org/_/community.html

Mick Semb Wever
Mick Semb Wever

Mick is part of a global team that provides expert advice on Apache Cassandra and global data platforms: from feasibility and prototype demonstrations to complex enterprise designs built for innovation and minimum TCO, down to deep performance tuning, diagnostics, and instrumentation of distributed technologies, operating systems, and hardware. Mick has been an Apache Cassandra community member since 2010.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK