1

Ext4 data corruption in stable kernels

 9 months ago
source link: https://lwn.net/Articles/954285/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:08 UTC (Sun) by wtarreau (subscriber, #51152) [Link]

> The patch was marked for stable inclusion. Which means the author has demonstrated and tested the problem and has then thought it would be needed to backport it.
>
> Mistakes happen.

Definitely, there are still humans in the delivery chain. Everything went well this time and only two versions were affected in the end. I think we're just facing another grumpy user who wants 100% guarantee of zero bug. The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come. There's a solution to this: not using a computer nor anything made using a computer nor anything made by something made using a computer. Living in the woods making fire by hitting stones can have its fun but will not necessarily be safer.

For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them). This helps to avoid such cases. But that's not rocket science, and for this one I would likely have updated to that version precisely because it included an ext4 fix!

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 23:27 UTC (Sun) by bgilbert (subscriber, #4738) [Link]

> The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come.

"Stable" is not a prediction of forces beyond developer control. It's an assertion of a quality bar, which needs to be backed by appropriate tools, testing, and developer time.

> For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them).

As I understand Greg KH's position, anyone applying such a policy is irresponsible for not immediately installing the newest batch of patches.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:47 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> > The same type of people who complain about unanticipated storms and who then complain about mistaken weather forecast when it announces rain that doesn't come.

> "Stable" is not a prediction of forces beyond developer control. It's an assertion of a quality bar, which needs to be backed by appropriate tools, testing, and developer time.

Which is exactly the case. Look at the latest 6.6.5-rc1 thread for example:
https://lore.kernel.org/all/20231205031535.163661217@linu...

I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup. I think this definitely qualifies for "appropriate tools", "testing" and "developer time", and I doubt many other projects devote that amount of efforts to weekly releases.

> > For my stable kernel usages, I *tend* to pick one or two versions older than the last one if I see that the recent fixes are not important for me (i.e. I won't miss them).
>
> As I understand Greg KH's position, anyone applying such a policy is irresponsible for not immediately installing the newest batch of patches.

No, for having already discussed this topic with him, I'm pretty sure he never said this. I even remember that once he explained that he doesn't want to advertise severity levels in his releases so that users upgrade when they feel confident and not necessarily immediately nor when it's written that now's a really important one. Use cases differ so much between users that some might absolutely need to upgrade to fix a driver that's going to ruin their data while others might prefer not to as a later fix could cause serious availability issues.

Periodically applying updates is a healthy approach, what matters is that severe bugs do not live long enough in the wild and that releases are frequent enough to help narrow down an occasional regression based on the various reports. I personally rebuild every time I reboot my laptop (it's quite rare thanks to suspend), and phone vendors tend to update only once every few months and that's already OK.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:44 UTC (Mon) by bgilbert (subscriber, #4738) [Link]

I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup. I think this definitely qualifies for "appropriate tools", "testing" and "developer time", and I doubt many other projects devote that amount of efforts to weekly releases.

Many other projects have CI tests that are required to pass before a new release can ship. If that had been the case for LTP, this regression would have been avoided. What's more, the problem was reported to affect 6.1.64 during its -rc period, but no action was taken to fix that release. 6.1.64 was released with the problem four days later.

Mistakes happen! But this is an opportunity to improve processes to prevent a recurrence, rather than accepting the status quo.

No, for having already discussed this topic with him, I'm pretty sure he never said this. I even remember that once he explained that he doesn't want to advertise severity levels in his releases so that users upgrade when they feel confident and not necessarily immediately nor when it's written that now's a really important one. Use cases differ so much between users that some might absolutely need to upgrade to fix a driver that's going to ruin their data while others might prefer not to as a later fix could cause serious availability issues.

I have personally been complained at by Greg for fixing a stable kernel regression via cherry-pick, rather than shipping the latest release directly to distro users. I've seen similarly aggressive messaging in other venues. In fact, the standard release announcement says:

All users of the x.y kernel series must upgrade.

If downstream users are intended to take a more cautious approach, the messaging should be clarified to reflect that.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 4:52 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> but no action was taken to fix that release. 6.1.64 was released with the problem four days later.

You should really see that as a pipeline. Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed. Sure we're still missing a way to tag certain versions as broken, like happened for 2.4.11 that was marked "dontuse" in the download repository. But it's important to understand that the constant flow of fixes doesn't easily prevent a release from being cancelled instantly.

I would not be shocked to see 3 consecutive kernels being emitted and tagged as "ext4 broken" there for the time it takes to get knowledge of the breakage and fix it.

> I have personally been complained at by Greg for fixing a stable kernel regression via cherry-pick, rather than shipping the latest release directly to distro users.

Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do and that some distros have been doing for a while, sometimes shipping kernels remaining vulnerable for months or years due to this bad practice. The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. If you perform any other assembly of patches, nobody knows if they work well together or if another important patch is missing (as happened above). Here the process worked fine because developers reported the missing patches. Imagine if you took that single patch yourself, nobody would have known and you could have corrupted a lot of your users' FSes.

So please, for your users, never ever cherry-pick random patches from stable. Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 9:58 UTC (Tue) by bgilbert (subscriber, #4738) [Link]

Even if the issue was reported you don't know if it was noticed before 6.1.64 was emitted. What matters is that the issue was quickly fixed.

The message I linked above is dated November 24 and reported a regression in v6.1.64-rc1. The testing deadline for 6.1.64 was November 26, and it was released on November 28. That report was sufficient to cause a revert in 5.10.y and 5.15.y, so I don't think there can be an argument that not enough information was available.

The users who had data corruption, or who had to roll out an emergency fix to avoid data corruption, don't care that the issue was quickly fixed. They can always roll back to an older kernel if they need to. They care that the problem happened in the first place.

The reason for recommending against cherry-picking is very simple (and was explained in lengths at multiple conferences): the ONLY combinations of kernel patches that are both tested and supported by the subsystem maintainers are the mainline and stable ones. [...] Take the whole stable, possibly a slightly older one if you don't feel easy with latest changes, add your distro-specific patches on top of it, but do not pick what seems relevant to you, that will eventually result in a disaster and nobody will support you for having done this.

What are you talking about? If I ship a modified kernel and it breaks, of course no one will support me for having done so. If I ship an unmodified stable kernel and it breaks, no one will support me then either! The subsystem maintainers aren't going to help with my outage notifications, my users, or my emergency rollout. As with any downstream, I'm ultimately responsible for what I ship.

In the case mentioned upthread, my choices were: a) cherry-pick a one-line fix for the userspace ABI regression, or b) take the entire diff from 4.14.96 to 4.14.97: 69 patches touching 92 files, +1072/-327 lines. Option b) is simply not defensible release engineering. If I can't hotfix a regression without letting in a bunch of unrelated code, I'll never converge to a kernel that's safe to ship. That would arguably be true even if stable kernels didn't have a history of user-facing regressions, which they certainly did.

This discussion is a great example of the problem I'm trying to describe. Stable kernels are aggressively advertised as the only safe kernels to run, but there's plenty of evidence that they aren't safe, and the stable maintainers tend to denigrate and dismiss users' attempts to point out the structural problems — or even to work around them! These problems can be addressed, as I said, with tools, testing, and developer time. There is always, always, always room for improvement. But that will only happen if the stable team decides to make improvement a priority.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 10:19 UTC (Tue) by geert (subscriber, #98403) [Link]

Playing the devil's advocate (which can be considered appropriate for v6.6.6 ;-)

> Here you're speaking about cherry-picking fixes. That's something extremely dangerous that nobody must ever do [...]

But stable is also cherry-picking some changes, but not others?!?!? Nobody knows if they work well together or if another important patch is missing...

The only solution is to follow mainline ;-)

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 0:16 UTC (Tue) by roc (subscriber, #30627) [Link]

> I've counted 17 people responding to that thread with test reports, some of which indicate boot failures, others successes, on a total of around 910 systems covering lots of architectures, configs and setup.

Relying on volunteers to manually build and boot RC kernels is both inefficient and inadequate. There should be dedicated machines that automatically build and boot those kernels AND run as many automated tests as can be afforded given the money and time available. With some big machines and 48 hours you can run a lot of tests.

This isn't asking for much. This is what other mature projects have been doing for years.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 4:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> There should be dedicated machines that automatically build and boot those kernels AND run as many automated tests as can be afforded given the money and time available. With some big machines and 48 hours you can run a lot of tests.
>
> This isn't asking for much. This is what other mature projects have been doing for years.

Well, if you and/or your employer can provide this (hardware and manpower to operate it), I'm sure everyone will be extremely happy. Greg is constantly asking for more testers. You're speaking as if some proposal for help was rejected, resources like this don't fall out from the sky. Also you seem to know what tests to run on them, please do! All the testers I mentioned run their own tests from different (and sometimes overlapping) sets and that's extremely useful.

But saying "This or that should be done", the question remains "by whom if it's not by the one suggesting it?".

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 7:16 UTC (Tue) by roc (subscriber, #30627) [Link]

Regarding which tests to run: as bgilbert said: "Many other projects have CI tests that are required to pass before a new release can ship. If that had been the case for LTP, this regression would have been avoided." Of course the LTP test *did* run; it's not just about having the tests and running the tests, but also gating the release on positive test results.

As it happens my rr co-maintainer Kyle Huey does regularly test RC kernels against rr's regression test suite, and has found (and reported) a few interesting bugs that way. But really the Linux Foundation or some similar organization should be responsible for massive-scale automated testing of upstream kernels. Lots of companies stand to benefit financially from more reliable Linux releases, and as I understand it, the LF exists to channel those common interests into funding.

Ext4 data corruption in stable kernels

Posted Dec 12, 2023 10:11 UTC (Tue) by bgilbert (subscriber, #4738) [Link]

Suppose the stable team announced their intention to gate stable releases on automated testing, and put out a call for suitable test suites. Test suites could be required to meet a defined quality bar (low false positive rate, completion within the 48-hour review period, automatic bisection), and any suite that repeatedly failed to meet the bar could be removed from the test program. If no one at all stepped up to offer their tests, I would be shocked.

The stable team wouldn't need to own the test runners, just the reporting API, and the API could be quite simple. I agree with roc that the Linux Foundation should take some financial responsibility here, but I suspect some organizations would run tests and contribute results even if no funding were available.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:02 UTC (Mon) by farnz (subscriber, #17727) [Link]

Greg's position is a lot less concrete than that - it's "I make no assertions about whether or not any given batch of patches fixes bugs you care about; if you want all the fixes I think you should care about, then you must take the latest batch". Whether you want all the fixes that Greg thinks you should is your decision - but he makes no statement about what subset of stable patches you should pick in that case.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:10 UTC (Sun) by Wol (subscriber, #4433) [Link]

> Because it is marked
> CC: [email protected]

> Mistakes happen.

It's not always a mistake. As usual, we are using technology to try and fix a social problem. Some upstreams, I believe, have a habit of cc'ing *everything* to stable. If they've triaged it, and it's important enough, then fine. Too many of them don't.

The stable maintainers don't have time to triage everything. Upstream sometimes cannot be bothered to triage everything (or expect someone else to do it for them). What do you expect?

Cheers,
Wol

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 19:42 UTC (Sun) by saffroy (guest, #43999) [Link]

> The stable maintainers don't have time to triage everything. Upstream sometimes cannot be bothered to triage everything (or expect someone else to do it for them). What do you expect?

What I did expect until today was that existing well-known test suites like LTP (which revealed the bug) would be on the critical path for a stable release.

I am very curious why they are not.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:31 UTC (Mon) by intgr (subscriber, #39733) [Link]

> existing well-known test suites like LTP (which revealed the bug)

Interesting fact. Do you have a link to it? And any discussions that followed?

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:39 UTC (Mon) by Kamiccolo (subscriber, #95159) [Link]

It was posted in this thread some time ago:
https://lore.kernel.org/stable/81a11ebe-ea47-4e21-b5eb-53...

I'd say LTP deserve at least a little bit more love ;)

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 12:41 UTC (Sun) by rolexhamster (guest, #158445) [Link]

No. It would just delay a lot of fixes.

Perhaps that would be a good thing, especially when it comes to critical subsystems. Filesystems should not eat data. Was that commit really necessary for backporting to stable?

There's a big difference between a few corrupted pixels (say due to a bug in DRM or GPU driver), and a few corrupted files. The former is a nuisance, while the latter is a critical failure. Maybe it would be useful to classify stuff sent to stable@kernel as high/med/low risk, based on what subsystem it touches.

Ext4 data corruption in stable kernels

Posted Dec 10, 2023 13:04 UTC (Sun) by mb (subscriber, #50428) [Link]

>Filesystems should not eat data. Was that commit really necessary for backporting to stable?

By your own reasoning that filesystems should not eat data, it was necessary.

From the commit message:

>on ext4 O_SYNC direct IO does not properly
>sync file size update and thus if we crash at unfortunate moment, the
>file can have smaller size although O_SYNC IO has reported successful
>completion.

It was supposed to prevent data corruption.
-> "Filesystems should not eat data."
-> We must apply it to stable a.s.a.p.

>Maybe it would be useful to classify stuff sent to stable@kernel as high/med/low risk,
>based on what subsystem it touches.

And that would have prevented this fix from being applied to stable?
I doubt it. It was supposed to avoid corruption.

I am not saying that things went well here and I am not saying the stable process is perfect.
But in reality such problems happen rarely in stable.
Stable is not supposed to be an enterprise kernel. It is supposed to collect fixes with the least amount of manual work possible. That is guaranteed to introduce bugs sooner or later. But I think it's the best we can do at this level.

I don't think any kind of risk classification can help here.
It's basically the same problem as with security fixes. People will just start to argue whether a fix actually is really needed or not. And wrong decisions will then be made. Which will also lead to buggy stable kernels.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 9:06 UTC (Mon) by cloehle (subscriber, #128160) [Link]

The main issue with delaying these fixes further is that a good chunk of them is or can be security-related and thus it's obvious they should be deployed asap.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 9:53 UTC (Mon) by rolexhamster (guest, #158445) [Link]

Then perhaps the security fixes should always be clearly labeled as such in the list of changes, preferably with an associated CVE number (or !CVE if that's more amenable).

Otherwise we're in the security by obscurity weeds, where the "all-users-must-upgrade" upgrades are being applied in an uninformed manner, pulling in both the wheat and the chaff.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 14:20 UTC (Mon) by cloehle (subscriber, #128160) [Link]

Absolutely not, the kernel community rejects the CVE systems, and for very good reasons.

It is a completely unreasonable amount of effort to categorize bugs into "Very likely not security-related" and "Security-related", in fact everyone that attempts this (most vendors) messes up regularly, which is a huge weakness of the CVE (and !CVE) systems for that matter.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:19 UTC (Mon) by rolexhamster (guest, #158445) [Link]

Using that logic ("all-users-must-upgrade"), all patches in a given stable release are both security fixes and not security fixes at the same time. (In other words, heisen-patches, fashioned after heisenbugs). That's a cop out.

If a patch (or collection of patches) fixes an existing CVE/!CVE, why not simply state that in the changelog? This is distinct and separate from asking to categorize each bug/patch as "Very likely not security-related" and "Security-related".

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:50 UTC (Mon) by cloehle (subscriber, #128160) [Link]

>Using that logic ("all-users-must-upgrade"), all patches in a given stable release are both security fixes and not security fixes at the same time.

And that is kind of the current situation, although strangely worded.
The kernel doesn't make the distinction, don't run a kernel with fixed bugs.

To get a CVE many vendors require you to actually prove an exploit and that is often magnitudes more effort for both the reporter and the CNA to verify, but for now the kernel community would rather spend the potential days to months with fixing stuff instead of thinking "How could this bug be exploited somehow?".

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 16:50 UTC (Mon) by farnz (subscriber, #17727) [Link]

Without first determining whether or not a given patch is, or is not, a fix for a CVE/!CVE, how do I state the CVE number in the changelog? Bear in mind that at the point I write the patch, I may just be fixing a bug I've seen, without realising that it's security relevant, or indeed that someone has applied for a CVE number for it.

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 15:13 UTC (Mon) by farnz (subscriber, #17727) [Link]

That requires someone (who's willing to stand behind their effort) to look over all of the changes, and tell you which ones have no security relevance. And thinking this way reveals the problem with "I only apply the security-relevant bugfixes"; to do that, you first need to know which bugfixes are security relevant, which in turn implies that you know which bugfixes are not security relevant.

If you merely take all bugfixes that are known to be security relevant, then you're engaging in theatre; there will always be security relevant bugfixes that aren't known to be security relevant, either because no-one in the chain from the bug finder to Greg recognised that this bug had security relevance, or because people who recognised that it was security relevant chose to hide that fact for reasons of their own (e.g. because they work for the NSA, want future kernels to be fixed, but benefit from people not rushing to backport the fix to a vendor's 3.3 kernel).

Ext4 data corruption in stable kernels

Posted Dec 11, 2023 13:52 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> No. It would just delay a lot of fixes.
>
> Perhaps that would be a good thing, especially when it comes to critical subsystems.

No, it would just leave users exposed longer to them and make them appear with many more related fixes, making it even harder to spot the culprit. The problem is that some users absolutely want to reject the responsibility on someone else:
- a fix is missing, what are you doing maintainers, couldn't you pick it for stable ?
- a fix broke my system, what are you doing maintainers, couldn't you postpone it ?

It will never change anyway, but it will continue to add lines here on lwn :-)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK