Whitepaper: Couchbase vs RavenDB Performance at Rakuten Kobo

Comments

09 Apr 2021
14:47 PM

13xforever

That's nice, maybe I'll live the day when their JP store won't throw some kind of error on every page navigation 😹

09 Apr 2021
22:06 PM

Rafal

Congrats, looks like RavenDB is not a couch potato And managed to do the task with almost no overhead in disk usage vs raw data However, i wonder, if the goal was to optimize data structure for quick search of highlights by user id and book id, i think there's still a lot of overhead even in the raw data. 1.35 billion records, assume big numbers and lets take 8 bytes for book Id, 8 bytes for user Id and 4 bytes for position in the book - this gives us 27 GB of data. With binary storage of data and indexes we would fit everything in 64GB. Just put it in bare Voron, or BerkeleyDB and a single laptop would handle hundreds of thousands of queries per second. And you dont need clusters, sharding, caching...

09 Apr 2021
22:11 PM

Oren Eini

Rafal,

The key here isn't the association of user to books, what we were working here was the highlights. There is a sample document there that shows the data.

Yes, you can try to model things in the manner you describe, but then the cost of loading the data for a user request becomes much higher. You'll need to get the book contents (may be big), scan to to relevant location, parse the content, translate markup to text, etc. It is cheaper and easier to do it the other way around.

Especially when you have to do that once per highlights, and some people do a LOT of highlights.

09 Apr 2021
22:11 PM

Oren Eini

Rafal,

Another thing, note that the data wasn't just for the highlights. The dataset include a lot of other details which weren't relevant for this specific benchmark. They were there to show data management for large databases.

09 Apr 2021
23:01 PM

Rafal

Yep, must have oversimplified it. But you know, 'billions of something' looks impressive until you realize that gigabyte is a billion, and even your phone has few GB of RAM. So not everyting with a billion records is necessarily a large database that requires a datacenter (but with a careful choice of database you may well need one)

09 Apr 2021
23:06 PM

Rafal

... which reminds me of recent mention of Parler and their insane data overheads - the couchdb case doesnt look that bad compared to that

11 Apr 2021
14:40 PM

Gabriel

Excellent read. And now go against Mongo and Cosmos DB please...

11 Apr 2021
21:53 PM

Rafal

This is probably a difficult subject - going against competing products while knowing they all serve the same purpose, and all get the job done, and neither is particularly expensive - i would not expect spectacular differences. However, like shown here, if you get disk usage reduction by factor of 2-3, and need half the RAM, maybe half the infrastructure, then its substantial, not spectacular but still worth showing. Spectacular would be for example negating the need for an expensive cluster, reducing number of servers 10-fold, but this is not possible without changing the approach entirely. And IT folk are not that easy to impress - after all they are the IT gurus in companies, the experts and know-it-alls, who made some decisions and need to prove they were right => so anyone coming and announcing 'hey, your database is a slow, bloated, data-losing monstrosity' will be shot immediately or at least called an idiot.

Much better in my opinion is to find a specialization, some niche where your product really solves some problem better than everything else out there, and then it will shine. Not sure if it applies to databases - a very general-purpose tool, but maybe in some particular class of problems, in some specific businesses... NB, there are many specialized, niche products (for example, software for handling medical data) where companies can successfully sell products of inferior quality just because they get a hit on several keywords, have some compliance certificates that technically mean nothing but no one else has them.. not implying that this is the way to go but seems a clever strategy

12 Apr 2021
17:56 PM

Oren Eini

Rafal,

Do note that for real world scenarios, you can run at 8% of the hardware costs ! That is better than your 10-fold scenario.

12 Apr 2021
18:50 PM

Rafal

I admit i didnt parse that information from the article. Pretty bad differences at some points, dont know couchbase at all but maybe there's some configuration problem or it's used in a wrong way for the data? Or the community edition has some speed limit built in?

12 Apr 2021
19:02 PM

Oren Eini

Rafal,

I pinged someone that is quite knowledgeable in how Couchbase works, they didn't find any glaring issues in the way we set up things. We also tested the Enterprise edition, their license limits the detail I can expose, but it isn't a magic fix.

12 Apr 2021
20:11 PM

Rafal

Then i hope they do the right thing at Rakuten :)

Comments

Join the conversation...

Recommend

7-11便利店换标！才发现用了75年的背景框不见了

Google Supports Rust For Android OS Development

No Single Data Repository Can Be Your Silver Bullet

Episode 191 – Scrum with Dr. Jeff Sutherland

SharePoint Integration using SAP CPI (Without SAP Open Connector)

#29 - Building Things You Love, with Ben Sandofsky, Head of Development for Hali...

Episode 127 - The Red-Gate-took-over-and-now-we-get-free-stuff Summit

Viewing overwritten configuration values in ASP.NET Core

Creating Video Content with Camtasia

Angular Interview : is Angular Service Singleton ? – Dhananjay Kumar

About Joyk