Diving Deep on S3 Consistency
source link: https://news.ycombinator.com/item?id=26968627
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
GCS allows object to be replaced conditionally with `x-goog-if-generation-match` header, which sometimes can be quite useful.
BTW, DynamoDB supports conditional PUTs if your data can fit under 400 KiB.
Can cover some of the use cases
> We built automation that can respond rapidly to load concentration and individual server failure. Because the consistency witness tracks minimal state and only in-memory, we are able to replace them quickly without waiting for lengthy state transfers.
So this means that the "system" that contains the witness(es) is a single point of truth and failure (otherwise we would lose consistency again), but because it does not have to store a lot of information, it can be kept in-memory and can be exchanged quickly in case of failure.
Or in other words: minimize the amount of information that is strictly necessary to keep a system consistent and then make that part its own in-memory and quickly failover-able system which is then the bar for the HA component.
Is that what they did?
It's a great change.
From what I've read, if a network issue occurs which would impair consistency, S3 sacrifices availability. The write would just fail.
But this isn't your 5-node distributed system. Like they mention in the article, the witness system can remove and add nodes very quickly and it's highly redundant. A network issue that would actually cause split-brain or make it difficult to reach consensus would be few and far between.
I'm picturing a replicated, in-memory KV store where the value is some sort of version or timestamp representing the last time the object was modified. Cached reads can verify they are fresh by checking against this version/timestamp, which is acceptable because it's a network+RAM read. Is this somewhat accurate?
However, even a "basic" distributed lock system (like a consistently-hashed in-memory DB, sharded across reliable servers) might provide both the scale and single source of truth that's needed. The difficulty arises when one of those servers has a hiccup.
It'd be a delicious irony if it was based on hardware like an old-school mainframe or something like that.
The article seems to explain why there is a caching issue, and that's understandable, but it also reads as if you wanted to fix it. I would think the headliner and bold font if it was actually fixed.
For those curious, the problem is that S3 is "eventually consistent", which is normally not a problem. But consider a scenario where you store a config file on S3, update that config file, and redeploy your app. The way things are today you can (and yes, sometimes do) get a cached version. So now there would be uncertainty of what was actually released. Even worse, some of your redeployed apps could get the new config and others the old config.
Personally, I would be happy if there was simply an extra fee for cache-busting the S3 objects on demand. That would prevent folks from abusing it but also give the option when needed.
"Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions, and is available to you at no extra charge! There’s no impact on performance, you can update an object hundreds of times per second if you’d like, and there are no global dependencies."
Let's assume you had strong consistency in S3. If your app is distributed (tens, hundreds, or thousands of instances running) then all instances are not going to update at the same time, atomically.
You still need to design flexibility into your app to handle the case where they are not all running the same config (or software) version at the same time.
Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.
If you really need atomic updates across a distributed system then you're looking at more expensive solutions, like DynamoDB (which does offer consistent reads), or other distributed caches.
> Thus, once you've built a distributed system that is able to handle a phased rollout of software/config versions (and rollback), then having cache inconsistency in S3 is no big deal.
But this would also mean you can't use S3 as your source of truth for config, which is precisely what a lot of people want to do.
It looks like it does exactly that now, it just wasn't clear from the article.
[0] https://docs.aws.amazon.com/systems-manager/latest/userguide...
"After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object."
Search:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK