Modern C for Fedora (and the World)
source link: https://lwn.net/Articles/954018/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Modern C for Fedora (and the world)
Posted Dec 8, 2023 17:21 UTC (Fri) by cjwatson (subscriber, #7322) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 10:59 UTC (Sat) by fw (subscriber, #26023) [Link]
But it turns out that on x86-64, without PIE, global data, constants, and the heap all are in the first 32 bits of the address space. Even today, only the stack is outside that range. So you get surprisingly far with 32-bit pointers only. It really shouldn't work, but it does in many cases. But of course PIE changes that.
Modern C for Fedora (and the world)
Posted Dec 16, 2023 8:16 UTC (Sat) by mpr22 (subscriber, #60784) [Link]
surely that depends on the size of your heap
Modern C for Fedora (and the world)
Posted Dec 8, 2023 17:32 UTC (Fri) by Paf (subscriber, #91811) [Link]
So the assumption that made most of that code work didn’t become wrong with the advent of 64 bit.
Fwiw, I work in a project that has long done Wall and Werror and so all of these constructs terrify me :)
Modern C for Fedora (and the world)
Posted Dec 8, 2023 19:34 UTC (Fri) by roc (subscriber, #30627) [Link]
Though, maybe compilers have stopped adding warnings to -Wall and now only add to -Wextra instead? I wish I knew.
Modern C for Fedora (and the world)
Posted Dec 8, 2023 22:23 UTC (Fri) by NYKevin (subscriber, #129325) [Link]
Eh, it depends what you want to get out of Wall/Werror. If you're a distro, of course you don't want to use it, it will break all the packages all the time. If you're an upstream, and you also require zero lint errors (for whatever linter your project is using), then this is much less problematic. By the time something makes it into -Wall, the linters have probably been complaining about it for years, and so in practice, the amount of breakage when you upgrade to a new compiler is rather limited. And you always have the option of (temporarily) doing -Wall -Wno-foo if a particular warning causes issues.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 7:56 UTC (Sat) by wtarreau (subscriber, #51152) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 11:09 UTC (Sat) by Sesse (subscriber, #53779) [Link]
Modern C for Fedora (and the world)
Posted Dec 8, 2023 22:40 UTC (Fri) by fwiesweg (subscriber, #116364) [Link]
On the other hand, if you are able to keep up with the load, it's about the best thing you can do. With each modernization push, enforced by making warnings fail hard, the amount of runtime errors can be brought down considerably, and by now nearly all issues we have are caused by missing or disabled static check.
Of course, updating was gruesome, tedious work, but it makes the life after much more relaxed and enjoyable. I even ran a Friday deployment today without being overly worried, something I'd never have done just five years ago. In then long-run, -Wall was really worth it.
Modern C for Fedora (and the world)
Posted Dec 8, 2023 23:45 UTC (Fri) by roc (subscriber, #30627) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 0:30 UTC (Sat) by pbonzini (subscriber, #60935) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 1:02 UTC (Sat) by roc (subscriber, #30627) [Link]
Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 8:15 UTC (Sat) by pm215 (subscriber, #98099) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 11:11 UTC (Sat) by smcv (subscriber, #53363) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 16:41 UTC (Sat) by pm215 (subscriber, #98099) [Link]
Unless the compiler authors commit to "-Og will never lose debug info that is present in -O0" I personally use and advise others to use -O0.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 12:11 UTC (Sat) by kreijack (guest, #43513) [Link]
People building from upstream, should be able to deal with these kind of issues; usually this means reading a README file.
If not, they should use a distro package.
> Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.
This is a sane principle.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 8:35 UTC (Sat) by marcH (subscriber, #57642) [Link]
What I found to work well is to have -Werror added only in pre-merge CI. Not having it by default makes prototyping more convenient.
This is consistent with running linters in pre-merge CI while not forcing developers to run them all the time.
None of this approach is specific to C.
Of course you need to have some pre-merge CI in the first place. If you don't even have that minimal level of CI then the project is basically unmaintained.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 14:46 UTC (Sat) by mathstuf (subscriber, #69389) [Link]
Modern C for Fedora (and the world)
Posted Dec 9, 2023 17:54 UTC (Sat) by marcH (subscriber, #57642) [Link]
This being said, the simplest and best solution is to compile twice: once without -Werror and once with -Werror. This can be in two separate (and clearly labeled) runs or even consecutively. The first run shows all warnings and the second blocks the merge.
This is a bit similar to the `make || make -j1` technique that avoids (real) errors being drowned by many threads and confusing developers.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 21:43 UTC (Sat) by mathstuf (subscriber, #69389) [Link]
I'll do an initial run on all of the CI configurations to get a survey of what is broken and then focus on what is broken after that (I don't build all of the configurations locally to know anyways).
Modern C for Fedora (and the world)
Posted Dec 10, 2023 0:53 UTC (Sun) by marcH (subscriber, #57642) [Link]
If you have a good test framework that does all that for you then you should absolutely ignore my previous post. Not everyone is that lucky. I mean many projects don't even have any pre-merge CI at all (yet?). Remember that the main article is about Fedora and others stepping up to rescue orphaned projects coded in ancient C. In such a context my simple advice above definitely holds because it's just one extra line in your CI configuration. Super cheap and very high value and something people not familiar with CI may think about.
> Developers aren't looking through build logs.
They don't by default (assuming of course you have developers in the first place...)
They definitely do when there's a CI red light somewhere that threatens the merge of their code any maybe their deadline. In such a case I know from first hand experience that they really enjoy the simple "tricks" I recommended above.
> and uploads it to CDash for viewing.
I don't know anything about CDash but I know neither GitHub nor Jenkins nor Gitlab has any "yellow light"/warning concept, it's either green/pass or red/fail. Running twice with and without -Werror also solves that display limitation problem extremely cheaply. Again: if you have a smarter and better viewer then by all means ignore my tricks.
> We also don't need the second `-Werror` run (which pollutes the build cache)
Curious what you mean here.
Modern C for Fedora (and the world)
Posted Dec 10, 2023 4:10 UTC (Sun) by mathstuf (subscriber, #69389) [Link]
GitLab-CI does have a "warning" mode with the `allow_failure` key[1]. We use exit code 47 to indicate "warnings happened" so that the testing can proceed even though the build made warning noise. There are issues with PowerShell exit code extraction and that always hard-fails, but that seems to be a gitlab-runner issue (it worked before we upgraded for other reasons). It's actually nifty because it still reports as a `failed` *state* and the `allow_failure` key on the job just changes the render and "can dependent jobs proceed" logic, so our merge robot just sees that state and says "no" to merging.
> > We also don't need the second `-Werror` run (which pollutes the build cache)
> Curious what you mean here.
We have a shared cache for CI (`sccache`-based; `buildcache` on Windows). Adding another set of same-object output for a different set of flags just removes space otherwise ideally suited for storing other build results (*maybe* the object is deduplicated, but it doesn't seem necessary to me; probably backend-dependent anyways).
Modern C for Fedora (and the world)
Posted Dec 9, 2023 15:25 UTC (Sat) by Paf (subscriber, #91811) [Link]
So we have circumstances that are a bit different, I think.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 1:24 UTC (Sat) by Wol (subscriber, #4433) [Link]
That's what I did with a code base. Just worked through the codebase adding -W3 to each module in turn, and cleared all the errors. It took time, but the quality of the code base shot up, and loads of unexplained errors just disappeared :-)
Cheers,
Wol
Modern C for Fedora (and the world)
Posted Dec 11, 2023 14:48 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]
A reasonable way to think about this is to treat all those compiler warnings as technical debt. Paying off that technical debt will be painful, especially if you have a lot of it, but it's probably worth it in the long run. The big cost will be when you take a project that has allowed the warnings to pile up and suddenly force everyone to spend time fixing those warnings rather than develop anything new. Dealing with new warnings as compilers change their mind about what deserves a warning will be more manageable. The main problem in that case is letting the compiler writers dictate when you pay off your technical debt rather than making the decision yourself.
Modern C for Fedora (and the world)
Posted Dec 11, 2023 16:46 UTC (Mon) by Wol (subscriber, #4433) [Link]
We had a project where we couldn't suppress a particular warning (MSC v6, -W4, bought in library, unused arguments. Catch 22, we could fix warning A, but the fix triggered warning B, cue endless loop).
Anyways, our standards said "All warnings must be explained and understood". So that one we just ignored. There's no reason a project can't say "it's an old warning, we haven't got round to fixing it". But any new warning in modified code is an instant QA failure.
Cheers,
Wol
Modern C for Fedora (and the world)
Posted Dec 9, 2023 16:24 UTC (Sat) by jwarnica (subscriber, #27492) [Link]
Introducing a new complier version is a significant step. Perhaps you will need a development branch to work through that, but you should either never change compiler versions, or actually do all that is needed when you do....
Which could we be disabling particular checks in the build process. But if you said Wall, then you have implicitly deferred to the compiler people's taste.
Modern C for Fedora (and the world)
Posted Dec 9, 2023 22:42 UTC (Sat) by quotemstr (subscriber, #45331) [Link]
Modern C for Fedora (and the world)
Posted Dec 10, 2023 11:25 UTC (Sun) by joib (subscriber, #8541) [Link]
Modern C for Fedora (and the world)
Posted Dec 14, 2023 12:04 UTC (Thu) by spacefrogg (subscriber, #119608) [Link]
Modern C for Fedora (and the world)
Posted Dec 8, 2023 17:36 UTC (Fri) by Hello71 (subscriber, #103412) [Link]
Modern C for Fedora (and the world)
Posted Dec 11, 2023 8:10 UTC (Mon) by jengelh (subscriber, #33263) [Link]
Modern C for Fedora (and the world)
Posted Dec 13, 2023 1:52 UTC (Wed) by marcH (subscriber, #57642) [Link]
Big endian is more "human-friendly" because you can read hexdumps "as is" (because humans use big endian too)
Little endian is more "computer-friendly" because of what you just explained.
In other words, Gulliver is wrong here.
About type inference coming to the C language as well
Posted Dec 10, 2023 8:57 UTC (Sun) by swilmet (subscriber, #98424) [Link]
In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.
See this article that I wrote this night after reading this LWN article: About type inference
(the article is 2 pages long, a bit too long to copy here as a comment, I suppose).
About type inference coming to the C language as well
Posted Dec 10, 2023 9:00 UTC (Sun) by swilmet (subscriber, #98424) [Link]
About type inference coming to the C language as well
Posted Dec 10, 2023 11:52 UTC (Sun) by excors (subscriber, #95769) [Link]
Like, using `auto` instead of `const char*` or `ArrayList<String>` isn't a huge benefit, because those are pretty simple types. But when you're regularly writing code like:
for (std::map<std::string, std::string>::iterator it = m.begin(); it != m.end(); ++it) { ... }
then it gets quite annoying, since the type name makes up half the line, and it obscures the high-level intent of the code (which is simply to iterate over `m`). (And that's not the real type anyway; `std::string` is the templated `std::basic_string<char>`, and the `iterator` is a typedef which is documented to be a LegacyBidirectionalIterator which is a LegacyForwardIterator which is a LegacyIterator which specifies the `++it` operation etc, so in practice you're not going to figure out how the type behaves from the documentation - you're really going to need a type-aware text editor or IDE, at least until you've memorised enough of the typical library usage patterns. That's just an obligatory part of modern programming.)
Or in Rust you might rely on type inference like:
let v = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();
where you can see the important information (that it ends up with a vector of ints), and you can assume `v` is some sort of iterable thing but you don't care exactly what. Writing it explicitly would be something terrible like:
let v: std::iter::Map<std::str::SplitAsciiWhitespace<'_>, impl Fn(&str) -> i32> = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();
except that won't actually work because the `'_` is referring to a lifetime which I don't think there is any way to express in code; and the closure is actually an anonymous type (constructed by the compiler to contain any captured variables) which implements the `Fn` trait, and you can only use the `impl Trait` syntax in argument types (where it's a form of generics) and return types (where it's a kind of information hiding), not in variable bindings, so there's no way to name the closure type. Rust's statically-checked lifetimes and non-heap-allocated closures are useful features that simply can't work without type inference.
About type inference coming to the C language as well
Posted Dec 10, 2023 21:52 UTC (Sun) by tialaramex (subscriber, #21167) [Link]
Sure, I have no idea what "type" chars actually is, but it's clearly some sort of Iterator, and somebody named it chars, I feel entitled to assume it impl Iterator<Item = char> unless it's obvious in context that it doesn't.
If anything I think I more often resent needing to spell out types for e.g. constants where I'm obliged to specify that const MAX_EXPIRY: MyDayType = 398; rather than let the compiler figure out that's the only correct type. I don't hate that enough to think it should be changed, it makes sense, but I definitely run into it more often than I regret not knowing the type of chars in a construction like let chars = foo.bar().into_iter()
However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.
About type inference coming to the C language as well
Posted Dec 10, 2023 22:07 UTC (Sun) by mb (subscriber, #50428) [Link]
Yes, that is true.
Type inference works well in Rust due to its strict type system.
But a subset of Rust's type inference will probably work well in C.
About type inference coming to the C language as well
Posted Dec 10, 2023 23:00 UTC (Sun) by NYKevin (subscriber, #129325) [Link]
I would agree with this. The main concern I can think of is how C handles numeric conversions. They are messy, complicated, and I always have to look them up.[1] They can mostly be summarized as "promote everything to the narrowest type that can represent all values of both argument types, and if an integer, is at least as wide as int," but that summary is wrong (float usually *can't* represent all values of int, but C will just promote int to float anyway). Throwing type inference on top of that mess is probably just going to make things worse.
By contrast, Rust has no such logic. If you add i32 + i16, or any other situation where the types do not match, you just get a flat compiler error.
I do wish Rust would let me write this:
let x: i32 = 1;
let y: i16 = 2;
let z: i32 = x + y.into(); // Compiler error!
(Presumably this is because you can also add i32 + &i32, and the compiler isn't quite smart enough to rule out that override.)
The compiler suggests writing this abomination, which does work:
let z: i32 = x + <i16 as Into<i32>>::into(y);
But at least you can write this:
let x: i32 = 1;
let y: i16 = 2;
let y32: i32 = y.into();
let z: i32 = x + y32;
About type inference coming to the C language as well
Posted Dec 11, 2023 3:08 UTC (Mon) by NYKevin (subscriber, #129325) [Link]
let z: i32 = x + i32::from(y);
Obviously I need to spend more time studying Rust, or maybe actually sit down and write a toy program in it.
Finally, I should note that you can write "y as i32", but that's less safe because it will silently do a narrowing conversion. from() and into() can only do conversions that never lose data, and there's also try_from()/try_into() if you want to handle overflow explicitly.
About type inference coming to the C language as well
Posted Dec 11, 2023 13:08 UTC (Mon) by gspr (subscriber, #91542) [Link]
And there's try_from().expect("Conversion failure") for those cases where you wanna say "man, I don't really wanna think about this, and I'm sure the one type converts to the other without loss in all cases my program experiences – but if I did overlook something, then at least abort with an error message instead of introducing silent errors".
About type inference coming to the C language as well
Posted Dec 11, 2023 4:43 UTC (Mon) by swilmet (subscriber, #98424) [Link]
Both C++ and Rust have a large core language, while C has a small core language.
I see Rust more as a successor to C++. C programmers in general - I think - like the fact that C has a small core language. So in C the types remain small to write, and there are more function calls instead of using sophisticated core language features. C is thus more verbose, and verbosity can be seen as an advantage.
Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.
About type inference coming to the C language as well
Posted Dec 11, 2023 8:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link]
I disagree with this, assuming that "safe" means "cannot cause UB outside of an unsafe block." A safe version of C needs at least the following:
* Lifetimes and borrow checking, which implies a type annotation similar to generics.
* Type inference, or else you have to write lifetime annotations everywhere.
* Box<T> or something equivalent to Box<T>, or else you can't put big objects on the heap and move their ownership around.
* Arc<RwLock<T>> or some equivalent, or else you have no reasonable escape hatch from the borrow checker (other than unsafe blocks).
* Rc<RefCell<T>> or some equivalent, or else you have to use the multithreaded escape hatch even in single-threaded code.
* And then there are many other optimizations such as using Mutex<T> instead of RwLock<T>, or OnceCell<T> instead of RefCell<T>. All of these have valid equivalents in C, and should be possible to represent in our hypothesized "safe C" (without needing more than a minimal amount of unsafe, preferably buried somewhere in the stdlib so that "regular" code can be safe).
I just don't see how you provide all of that flexibility without doing monomorphization, at which point you're already 80% of the way to reinventing Rust.
About type inference coming to the C language as well
Posted Dec 11, 2023 11:10 UTC (Mon) by Sesse (subscriber, #53779) [Link]
About type inference coming to the C language as well
About type inference coming to the C language as well
Posted Dec 14, 2023 10:55 UTC (Thu) by swilmet (subscriber, #98424) [Link]
About type inference coming to the C language as well
Posted Dec 14, 2023 10:57 UTC (Thu) by farnz (subscriber, #17727) [Link]
You're not going to get very far when you can't access arguments, or do I/O. Wuffs is deliberately limited to not doing that, because it's dangerous to mix I/O with file format parsing.
About type inference coming to the C language as well
Posted Dec 11, 2023 11:35 UTC (Mon) by swilmet (subscriber, #98424) [Link]
But why not trying a C-to-Rust transpiler? (random idea).
By keeping a small core language with the C syntax, and having a new standard library that looks like Rust but uses more function calls instead.
The transpiler would "take" the new stdlib as part of the language, for performance reasons, and translates the function calls to Rust idioms.
A source-to-source compiler is of course not ideal, but that's how C++ was created ("C with classes" was initially translated to C code).
About type inference coming to the C language as well
Posted Dec 11, 2023 12:09 UTC (Mon) by farnz (subscriber, #17727) [Link]
You might want to look at the C2Rust project; the issue is that a clean transpiler to Rust has to use unsafe liberally, since C constructs translate to something that can't be represented in purely Safe Rust.
The challenge then becomes adding something like lifetimes (so that you can translate pointers to Rust references instead of Rust raw pointers) without "bloating" C. I suspect that it's impossible to have a tiny core language without pushing many problems into the domain of "the programmer simply must not make any relevant mistakes"; note, though, that this is not bi-directional, since a language with a big core can still push many problems into that domain.
About type inference coming to the C language as well
Posted Dec 12, 2023 10:32 UTC (Tue) by swilmet (subscriber, #98424) [Link]
But I had the idea to convert (a subset of) C to _safe_ Rust, of course. Instead of some Rust keywords, operators etc (the core language), have C functions instead.
Actually the GLib/GObject project is looking to have Rust-like way of handling things, see:
https://www.bassi.io/articles/2023/08/23/the-mirror/
(but a bit long to read, and one needs to know the GObject world to understand the blog post I think).
Anyway, that's an interesting topic for researchers. Then making it useful and consumable for real-world C projects is yet another task.
About type inference coming to the C language as well
Posted Dec 12, 2023 10:43 UTC (Tue) by farnz (subscriber, #17727) [Link]
The hard part is not the keywords and operators - it's the lifetime annotation system. Lifetimes are a check on what the programmer intended, so have to be possible to write as an annotation to pointer types in the C derived language, but then to be usable force you to have a generics system (since you want many things to be generic over a lifetime) with (at least) covariance and invariance possible to express.
And once you have a generics system that can express covariance and invariance for each item in a set of generic parameters, why wouldn't you allow that to be used for types as well as lifetimes? At which point, you have Rust traits and structs, and most of the complexity of Rust.
About type inference coming to the C language as well
Posted Dec 12, 2023 11:34 UTC (Tue) by mb (subscriber, #50428) [Link]
That is not possible, except for very trivial cases.
The C code does neither include enough information (e.g. lifetimes) for that to work, nor is it usually structured in a way for this to work.
Programming in Rust requires a different way of thinking and a different way of structuring your code. An automatic translation of the usual ideomatic C programs will fail so hard that it would be easier to rewrite it from scratch instead of translating it and then fixing the compile failures.
About type inference coming to the C language as well
Posted Dec 13, 2023 23:59 UTC (Wed) by swilmet (subscriber, #98424) [Link]
I started to learn Rust but dislike the fact that it has many core features ("high-level ergonomics"). It's probably possible to use Rust in a simplistic way though, except maybe if a library forces to use the fancy features.
About type inference coming to the C language as well
Posted Dec 14, 2023 9:37 UTC (Thu) by farnz (subscriber, #17727) [Link]
You could avoid using those libraries, and limit yourself to libraries that have a "simple" enough interface for you (no_std libraries are a good thing to look for here, since they're designed with just core and maybe alloc in mind, not the whole of std) - bearing in mind that you don't need to care how those libraries are implemented if it's just about personal preference.
In general, though, I wouldn't be scared of a complex core language - all of that complexity has to be handled somewhere, and a complex core language can mean that complexity is being compiler-checked instead of human-checked.
About type inference coming to the C language as well
Posted Dec 14, 2023 11:07 UTC (Thu) by swilmet (subscriber, #98424) [Link]
"Soft"ware, they said :-)
About type inference coming to the C language as well
Posted Dec 10, 2023 12:03 UTC (Sun) by Wol (subscriber, #4433) [Link]
Have a variable type of "infer"? That way, an undeclared variable is still an error, but you can explicitly tell the compiler to decide for itself :-)
Cheers,
Wol
Modern C for Fedora (and the world)
Posted Dec 10, 2023 19:59 UTC (Sun) by geert (subscriber, #98403) [Link]
However, gcc still seems to add "move.l %a0,%d0" at the end of any function returning a pointer type.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK