8

Whitepaper: Couchbase vs RavenDB Performance at Rakuten Kobo

 3 years ago
source link: https://ayende.com/blog/193633-A/whitepaper-couchbase-vs-ravendb-performance-at-rakuten-kobo?Key=e5a14a5f-aaae-4bb3-ae25-05c947367661&utm_campaign=Feed%3A+AyendeRahien+%28Ayende+%40+Rahien%29
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Comments

13xforever
09 Apr 2021
14:47 PM

That's nice, maybe I'll live the day when their JP store won't throw some kind of error on every page navigation 😹

Rafal
09 Apr 2021
22:06 PM

Congrats, looks like RavenDB is not a couch potato And managed to do the task with almost no overhead in disk usage vs raw data However, i wonder, if the goal was to optimize data structure for quick search of highlights by user id and book id, i think there's still a lot of overhead even in the raw data. 1.35 billion records, assume big numbers and lets take 8 bytes for book Id, 8 bytes for user Id and 4 bytes for position in the book - this gives us 27 GB of data. With binary storage of data and indexes we would fit everything in 64GB. Just put it in bare Voron, or BerkeleyDB and a single laptop would handle hundreds of thousands of queries per second. And you dont need clusters, sharding, caching...

Oren Eini
09 Apr 2021
22:11 PM

Rafal,

The key here isn't the association of user to books, what we were working here was the highlights. There is a sample document there that shows the data.

Yes, you can try to model things in the manner you describe, but then the cost of loading the data for a user request becomes much higher. You'll need to get the book contents (may be big), scan to to relevant location, parse the content, translate markup to text, etc. It is cheaper and easier to do it the other way around.

Especially when you have to do that once per highlights, and some people do a LOT of highlights.

Oren Eini
09 Apr 2021
22:11 PM

Rafal,

Another thing, note that the data wasn't just for the highlights. The dataset include a lot of other details which weren't relevant for this specific benchmark. They were there to show data management for large databases.

Rafal
09 Apr 2021
23:01 PM

Yep, must have oversimplified it. But you know, 'billions of something' looks impressive until you realize that gigabyte is a billion, and even your phone has few GB of RAM. So not everyting with a billion records is necessarily a large database that requires a datacenter (but with a careful choice of database you may well need one)

Rafal
09 Apr 2021
23:06 PM

... which reminds me of recent mention of Parler and their insane data overheads - the couchdb case doesnt look that bad compared to that

Gabriel
11 Apr 2021
14:40 PM

Excellent read. And now go against Mongo and Cosmos DB please...

Rafal
11 Apr 2021
21:53 PM

This is probably a difficult subject - going against competing products while knowing they all serve the same purpose, and all get the job done, and neither is particularly expensive - i would not expect spectacular differences. However, like shown here, if you get disk usage reduction by factor of 2-3, and need half the RAM, maybe half the infrastructure, then its substantial, not spectacular but still worth showing. Spectacular would be for example negating the need for an expensive cluster, reducing number of servers 10-fold, but this is not possible without changing the approach entirely. And IT folk are not that easy to impress - after all they are the IT gurus in companies, the experts and know-it-alls, who made some decisions and need to prove they were right => so anyone coming and announcing 'hey, your database is a slow, bloated, data-losing monstrosity' will be shot immediately or at least called an idiot.

Much better in my opinion is to find a specialization, some niche where your product really solves some problem better than everything else out there, and then it will shine. Not sure if it applies to databases - a very general-purpose tool, but maybe in some particular class of problems, in some specific businesses... NB, there are many specialized, niche products (for example, software for handling medical data) where companies can successfully sell products of inferior quality just because they get a hit on several keywords, have some compliance certificates that technically mean nothing but no one else has them.. not implying that this is the way to go but seems a clever strategy

Oren Eini
12 Apr 2021
17:56 PM

Rafal,

Do note that for real world scenarios, you can run at 8% of the hardware costs ! That is better than your 10-fold scenario. 

Rafal
12 Apr 2021
18:50 PM

I admit i didnt parse that information from the article. Pretty bad differences at some points, dont know couchbase at all but maybe there's some configuration problem or it's used in a wrong way for the data? Or the community edition has some speed limit built in?

Oren Eini
12 Apr 2021
19:02 PM

Rafal,

I pinged someone that is quite knowledgeable in how Couchbase works, they didn't find any glaring issues in the way we set up things. We also tested the Enterprise edition, their license limits the detail I can expose, but it isn't a magic fix. 

Rafal
12 Apr 2021
20:11 PM

Then i hope they do the right thing at Rakuten :)

Join the conversation...


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK