1

Modern C for Fedora (and the World)

 8 months ago
source link: https://lwn.net/Articles/954018/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:21 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

It was usually fatal on ia64, but crucially, it often wasn't fatal on amd64. I remember we implemented a special wrapper for Ubuntu builds to grep the build log for such warnings and forcibly fail the build if we spotted them ...

Modern C for Fedora (and the world)

Posted Dec 9, 2023 10:59 UTC (Sat) by fw (subscriber, #26023) [Link]

Even more surprising is int-conversion, which is introducing even more pointer clipping to 32 bits.

But it turns out that on x86-64, without PIE, global data, constants, and the heap all are in the first 32 bits of the address space. Even today, only the stack is outside that range. So you get surprisingly far with 32-bit pointers only. It really shouldn't work, but it does in many cases. But of course PIE changes that.

Modern C for Fedora (and the world)

Posted Dec 16, 2023 8:16 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

> the heap

surely that depends on the size of your heap

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:32 UTC (Fri) by Paf (subscriber, #91811) [Link]

I don’t think so - very few architectures have 64 bit ‘int’. Like, hardly any. Int is largely stuck as a 32 bit type.

So the assumption that made most of that code work didn’t become wrong with the advent of 64 bit.

Fwiw, I work in a project that has long done Wall and Werror and so all of these constructs terrify me :)

Modern C for Fedora (and the world)

Posted Dec 8, 2023 19:34 UTC (Fri) by roc (subscriber, #30627) [Link]

Enabling -Wall and -Werror by default is problematic because it means your code breaks every time a compiler introduces a new warning under -Wall.

Though, maybe compilers have stopped adding warnings to -Wall and now only add to -Wextra instead? I wish I knew.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 22:23 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> Enabling -Wall and -Werror by default is problematic because it means your code breaks every time a compiler introduces a new warning under -Wall.

Eh, it depends what you want to get out of Wall/Werror. If you're a distro, of course you don't want to use it, it will break all the packages all the time. If you're an upstream, and you also require zero lint errors (for whatever linter your project is using), then this is much less problematic. By the time something makes it into -Wall, the linters have probably been complaining about it for years, and so in practice, the amount of breakage when you upgrade to a new compiler is rather limited. And you always have the option of (temporarily) doing -Wall -Wno-foo if a particular warning causes issues.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 7:56 UTC (Sat) by wtarreau (subscriber, #51152) [Link]

That's what we do on haproxy: we build with -Wall -Wextra everywhere, and in addition to this, developers and CI have -Werror enabled. Due to the diversity of distros used by developers (and the CI) we generally manage to make sure no warnings are left for distros at release time.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 11:09 UTC (Sat) by Sesse (subscriber, #53779) [Link]

But upstream code tends to move into distros. :-) So shipping upstream with -Werror is pretty unhelpful. But running it _while developing_ is great, at least as long as you don't need to support like eight different obsolete C compilers with weird and different warning sets.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 22:40 UTC (Fri) by fwiesweg (subscriber, #116364) [Link]

That probably depends very much on your codebase. If you are badly understaffed for the age and amount of your code, yes, probably stay away from it. That's about where I started with my projects a decade ago.

On the other hand, if you are able to keep up with the load, it's about the best thing you can do. With each modernization push, enforced by making warnings fail hard, the amount of runtime errors can be brought down considerably, and by now nearly all issues we have are caused by missing or disabled static check.

Of course, updating was gruesome, tedious work, but it makes the life after much more relaxed and enjoyable. I even ran a Friday deployment today without being overly worried, something I'd never have done just five years ago. In then long-run, -Wall was really worth it.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 23:45 UTC (Fri) by roc (subscriber, #30627) [Link]

It's OK for *developers* to hit these errors, but it's broken for random users who just want to build upstream with a compiler that's newer than what the developers are using.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 0:30 UTC (Sat) by pbonzini (subscriber, #60935) [Link]

Just make sure that -Werror is easy to disable and enabled only when building from a VCS checkout, or something like that.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 1:02 UTC (Sat) by roc (subscriber, #30627) [Link]

People building from upstream are typically building from a VCS checkout so that doesn't help.

Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 8:15 UTC (Sat) by pm215 (subscriber, #98099) [Link]

The difficulty with doing it based on debug versus release is that often you want your debug build to be -O0 because the debugging experience is so much nicer, but that also turns off a lot of the data flow analysis that is needed for some of the warning categories.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 11:11 UTC (Sat) by smcv (subscriber, #53363) [Link]

Is gcc -Og ("turn on optimizations that don't hurt debugging too much") a reasonable compromise for debug mode?

Modern C for Fedora (and the world)

Posted Dec 9, 2023 16:41 UTC (Sat) by pm215 (subscriber, #98099) [Link]

It's supposed to be, but in practice I have found it is not, which makes it pretty useless. I tried -Og, got burned (by gdb reporting it could not tell me the values of variables in my program because they had been "optimized away") and went back to -O0.

Unless the compiler authors commit to "-Og will never lose debug info that is present in -O0" I personally use and advise others to use -O0.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 12:11 UTC (Sat) by kreijack (guest, #43513) [Link]

> People building from upstream are typically building from a VCS checkout so that doesn't help.

People building from upstream, should be able to deal with these kind of issues; usually this means reading a README file.
If not, they should use a distro package.

> Currently we build with -Werror -Wall for CMAKE_BUILD_TYPE=DEBUG, and not for CMAKE_BUILD_TYPE=RELEASE. That's assuming developers build regularly with DEBUG and people who just want a working upstream build don't. It works out OK in practice. It doesn't seem ideal but maybe it's about as good as it can be.

This is a sane principle.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 8:35 UTC (Sat) by marcH (subscriber, #57642) [Link]

-Werror (and others) should be easy to turn on and off, this is very important.

What I found to work well is to have -Werror added only in pre-merge CI. Not having it by default makes prototyping more convenient.

This is consistent with running linters in pre-merge CI while not forcing developers to run them all the time.

None of this approach is specific to C.

Of course you need to have some pre-merge CI in the first place. If you don't even have that minimal level of CI then the project is basically unmaintained.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 14:46 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Even there I find `-Werror` is not the best solution. We prefer to keep the warnings and allow the build to continue on after hitting an "error" and then fail the job at the very end if any warnings happened. This allows one to get more than one round of warnings out of CI at a time. Real failures still stop the build because continuing the build after a hard failure (`make -i`) is a recipe for chasing wild geese. The `-k` flag can be useful, but is a tradeoff between wasted CI cycles and comprehensive error reports.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 17:54 UTC (Sat) by marcH (subscriber, #57642) [Link]

Showing all warnings is nice but you don't want CI to babysit developers too much otherwise there will always be a couple lazy (and frankly: not very smart) developers who will "spam" CI because they can't bother to find how to enable -Werror themselves. They wrongly think they save time that way and waste and in some cases even slow down the whole CI infrastructure. Been there, seen that.

This being said, the simplest and best solution is to compile twice: once without -Werror and once with -Werror. This can be in two separate (and clearly labeled) runs or even consecutively. The first run shows all warnings and the second blocks the merge.

This is a bit similar to the `make || make -j1` technique that avoids (real) errors being drowned by many threads and confusing developers.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 21:43 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

Developers aren't looking through build logs. Instead, CTest gathers the output, does some filtering (e.g., we ignore third party warnings in CI) and uploads it to CDash for viewing. It also obviates the need for the `-j1` trick. We also don't need the second `-Werror` run (which pollutes the build cache) and instead just get the (post-filtered) warning count from CTest and trigger a script failure if it is non-zero.

I'll do an initial run on all of the CI configurations to get a survey of what is broken and then focus on what is broken after that (I don't build all of the configurations locally to know anyways).

Modern C for Fedora (and the world)

Posted Dec 10, 2023 0:53 UTC (Sun) by marcH (subscriber, #57642) [Link]

> Instead, CTest gathers the output, does some filtering

If you have a good test framework that does all that for you then you should absolutely ignore my previous post. Not everyone is that lucky. I mean many projects don't even have any pre-merge CI at all (yet?). Remember that the main article is about Fedora and others stepping up to rescue orphaned projects coded in ancient C. In such a context my simple advice above definitely holds because it's just one extra line in your CI configuration. Super cheap and very high value and something people not familiar with CI may think about.

> Developers aren't looking through build logs.

They don't by default (assuming of course you have developers in the first place...)

They definitely do when there's a CI red light somewhere that threatens the merge of their code any maybe their deadline. In such a case I know from first hand experience that they really enjoy the simple "tricks" I recommended above.

> and uploads it to CDash for viewing.

I don't know anything about CDash but I know neither GitHub nor Jenkins nor Gitlab has any "yellow light"/warning concept, it's either green/pass or red/fail. Running twice with and without -Werror also solves that display limitation problem extremely cheaply. Again: if you have a smarter and better viewer then by all means ignore my tricks.

> We also don't need the second `-Werror` run (which pollutes the build cache)

Curious what you mean here.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 4:10 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

> I don't know anything about CDash but I know neither GitHub nor Jenkins nor Gitlab has any "yellow light"/warning concept, it's either green/pass or red/fail. Running twice with and without -Werror also solves that display limitation problem extremely cheaply. Again: if you have a smarter and better viewer then by all means ignore my tricks.

GitLab-CI does have a "warning" mode with the `allow_failure` key[1]. We use exit code 47 to indicate "warnings happened" so that the testing can proceed even though the build made warning noise. There are issues with PowerShell exit code extraction and that always hard-fails, but that seems to be a gitlab-runner issue (it worked before we upgraded for other reasons). It's actually nifty because it still reports as a `failed` *state* and the `allow_failure` key on the job just changes the render and "can dependent jobs proceed" logic, so our merge robot just sees that state and says "no" to merging.

> > We also don't need the second `-Werror` run (which pollutes the build cache)

> Curious what you mean here.

We have a shared cache for CI (`sccache`-based; `buildcache` on Windows). Adding another set of same-object output for a different set of flags just removes space otherwise ideally suited for storing other build results (*maybe* the object is deduplicated, but it doesn't seem necessary to me; probably backend-dependent anyways).

[1] https://docs.gitlab.com/ee/ci/yaml/#allow_failure

Modern C for Fedora (and the world)

Posted Dec 9, 2023 15:25 UTC (Sat) by Paf (subscriber, #91811) [Link]

Just a little context for my specific project, because I agree with the points you’re making about inconvenience. I work on an out of tree (GPL licensed!) file system project. We support a decent variety of distributions, but we have extensive CI so we catch stuff early, and since it’s a file system, it’s not common for people to try to build for/with something we don’t support. (And our formal position on support for kernels is “there’s a (fairly wide) list we test, otherwise good luck and we’re accepting patches”).

So we have circumstances that are a bit different, I think.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 1:24 UTC (Sat) by Wol (subscriber, #4433) [Link]

> On the other hand, if you are able to keep up with the load, it's about the best thing you can do. With each modernization push, enforced by making warnings fail hard, the amount of runtime errors can be brought down considerably, and by now nearly all issues we have are caused by missing or disabled static check.

That's what I did with a code base. Just worked through the codebase adding -W3 to each module in turn, and cleared all the errors. It took time, but the quality of the code base shot up, and loads of unexplained errors just disappeared :-)

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 11, 2023 14:48 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

A reasonable way to think about this is to treat all those compiler warnings as technical debt. Paying off that technical debt will be painful, especially if you have a lot of it, but it's probably worth it in the long run. The big cost will be when you take a project that has allowed the warnings to pile up and suddenly force everyone to spend time fixing those warnings rather than develop anything new. Dealing with new warnings as compilers change their mind about what deserves a warning will be more manageable. The main problem in that case is letting the compiler writers dictate when you pay off your technical debt rather than making the decision yourself.

Modern C for Fedora (and the world)

Posted Dec 11, 2023 16:46 UTC (Mon) by Wol (subscriber, #4433) [Link]

If the project isn't too big ...

We had a project where we couldn't suppress a particular warning (MSC v6, -W4, bought in library, unused arguments. Catch 22, we could fix warning A, but the fix triggered warning B, cue endless loop).

Anyways, our standards said "All warnings must be explained and understood". So that one we just ignored. There's no reason a project can't say "it's an old warning, we haven't got round to fixing it". But any new warning in modified code is an instant QA failure.

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 9, 2023 16:24 UTC (Sat) by jwarnica (subscriber, #27492) [Link]

Given the fundamental theory of CI/CD is to fix discovered problems "now", allowing them to be ignored seems counterproductive. By doing it always (er, "continuously") you force yourself to discover implicit assumptions you did noy even know you made. That is as much true of code you wrote, as it is of external libraries, and the tools you use to build things.

Introducing a new complier version is a significant step. Perhaps you will need a development branch to work through that, but you should either never change compiler versions, or actually do all that is needed when you do....

Which could we be disabling particular checks in the build process. But if you said Wall, then you have implicitly deferred to the compiler people's taste.

Modern C for Fedora (and the world)

Posted Dec 9, 2023 22:42 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

What we really need is a date- or release-based -Wall. For example, one might write -Wall=gcc-12 and get all the warnings enabled with -Wall in GCC 12 and not any additional warnings GCC 13 might introduce. You could safely combine -Werror with -Wall this way.

Modern C for Fedora (and the world)

Posted Dec 10, 2023 11:25 UTC (Sun) by joib (subscriber, #8541) [Link]

It might *help*, but it's probably not foolproof either, as a newer GCC release might have improved the -Wfoo diagnostics path to catch cases that the older release didn't catch. Of course you could, in principle at least, split the enhanced version into a separate -Wfoo-gcc-XY, or something like that, which is only activated when -Wall=gcc-XX isn't enabled. But I suspect GCC wouldn't want to commit itself to such a level of backward compatibility in the warnings. And of course if there's ever any refactoring of some particular warning, requiring to provide perfect backward compatibility for the previous 27 major releases would probably be prohibitive.

Modern C for Fedora (and the world)

Posted Dec 14, 2023 12:04 UTC (Thu) by spacefrogg (subscriber, #119608) [Link]

This is trivial to achieve. Just record the GCC version in your build scripts and set -Wall, when it updates. Then, update the recorded version once you are satisfied. This doesn't need any upstream support.

Modern C for Fedora (and the world)

Posted Dec 8, 2023 17:36 UTC (Fri) by Hello71 (subscriber, #103412) [Link]

With register-based calling conventions, longs below 2^32 are usually passed unscathed. Furthermore, I believe x86-64 SysV ABI leaves the extended register bits (32-63) undefined, so in many cases they might accidentally have the right values.

Modern C for Fedora (and the world)

Posted Dec 11, 2023 8:10 UTC (Mon) by jengelh (subscriber, #33263) [Link]

In addition, the use of little endian causes low values behind a pointer argument or an stack-passed argument ("dword ptr [rsp+0x20]") to be accidentally "in the right spot". [In other words, whenever *(uint64_t)ptr == *(uint32_t)ptr.] It is a shame that big endian systems are going away.

Modern C for Fedora (and the world)

Posted Dec 13, 2023 1:52 UTC (Wed) by marcH (subscriber, #57642) [Link]

This is exactly why little endian and big endian are not just two sides of the same coin.

Big endian is more "human-friendly" because you can read hexdumps "as is" (because humans use big endian too)

Little endian is more "computer-friendly" because of what you just explained.

In other words, Gulliver is wrong here.

About type inference coming to the C language as well

Posted Dec 10, 2023 8:57 UTC (Sun) by swilmet (subscriber, #98424) [Link]

In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.

See this article that I wrote this night after reading this LWN article: About type inference

(the article is 2 pages long, a bit too long to copy here as a comment, I suppose).

About type inference coming to the C language as well

Posted Dec 10, 2023 9:00 UTC (Sun) by swilmet (subscriber, #98424) [Link]

(Oops, posted my comment as a sub-comment instead of a new top-level one, I clicked on the wrong reply button…)

About type inference coming to the C language as well

Posted Dec 10, 2023 11:52 UTC (Sun) by excors (subscriber, #95769) [Link]

I think an important point that's missing from your argument is that modern languages have much more sophisticated type systems than C, with features like generics, and modern libraries make use of those type systems, so type names are very commonly much longer (and sometimes impossible) to write. If you don't have type inference, the language will be restricted to much simpler types, and you lose the correctness and performance benefits of having more information statically encoded in types.

Like, using `auto` instead of `const char*` or `ArrayList<String>` isn't a huge benefit, because those are pretty simple types. But when you're regularly writing code like:

for (std::map<std::string, std::string>::iterator it = m.begin(); it != m.end(); ++it) { ... }

then it gets quite annoying, since the type name makes up half the line, and it obscures the high-level intent of the code (which is simply to iterate over `m`). (And that's not the real type anyway; `std::string` is the templated `std::basic_string<char>`, and the `iterator` is a typedef which is documented to be a LegacyBidirectionalIterator which is a LegacyForwardIterator which is a LegacyIterator which specifies the `++it` operation etc, so in practice you're not going to figure out how the type behaves from the documentation - you're really going to need a type-aware text editor or IDE, at least until you've memorised enough of the typical library usage patterns. That's just an obligatory part of modern programming.)

Or in Rust you might rely on type inference like:

let v = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

where you can see the important information (that it ends up with a vector of ints), and you can assume `v` is some sort of iterable thing but you don't care exactly what. Writing it explicitly would be something terrible like:

let v: std::iter::Map<std::str::SplitAsciiWhitespace<'_>, impl Fn(&str) -> i32> = line.split_ascii_whitespace().map(|s| s.parse().unwrap());
let vals: Vec<i32> = v.collect();

except that won't actually work because the `'_` is referring to a lifetime which I don't think there is any way to express in code; and the closure is actually an anonymous type (constructed by the compiler to contain any captured variables) which implements the `Fn` trait, and you can only use the `impl Trait` syntax in argument types (where it's a form of generics) and return types (where it's a kind of information hiding), not in variable bindings, so there's no way to name the closure type. Rust's statically-checked lifetimes and non-heap-allocated closures are useful features that simply can't work without type inference.

About type inference coming to the C language as well

Posted Dec 10, 2023 21:52 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

Yeah, I'm a caveman, often working in Vim without any special development features, yet I am not bothered at all to see e.g. let chars = foo.bar().into_iter();

Sure, I have no idea what "type" chars actually is, but it's clearly some sort of Iterator, and somebody named it chars, I feel entitled to assume it impl Iterator<Item = char> unless it's obvious in context that it doesn't.

If anything I think I more often resent needing to spell out types for e.g. constants where I'm obliged to specify that const MAX_EXPIRY: MyDayType = 398; rather than let the compiler figure out that's the only correct type. I don't hate that enough to think it should be changed, it makes sense, but I definitely run into it more often than I regret not knowing the type of chars in a construction like let chars = foo.bar().into_iter()

However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 22:07 UTC (Sun) by mb (subscriber, #50428) [Link]

>so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

Yes, that is true.

Type inference works well in Rust due to its strict type system.
But a subset of Rust's type inference will probably work well in C.

About type inference coming to the C language as well

Posted Dec 10, 2023 23:00 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

> However, of course C has lots of footguns which I can imagine would be worsened with inference, so just because it was all rainbows and puppies in Rust doesn't mean the same will be true in C.

I would agree with this. The main concern I can think of is how C handles numeric conversions. They are messy, complicated, and I always have to look them up.[1] They can mostly be summarized as "promote everything to the narrowest type that can represent all values of both argument types, and if an integer, is at least as wide as int," but that summary is wrong (float usually *can't* represent all values of int, but C will just promote int to float anyway). Throwing type inference on top of that mess is probably just going to make things worse.

By contrast, Rust has no such logic. If you add i32 + i16, or any other situation where the types do not match, you just get a flat compiler error.

I do wish Rust would let me write this:

let x: i32 = 1;
let y: i16 = 2;
let z: i32 = x + y.into(); // Compiler error!

(Presumably this is because you can also add i32 + &i32, and the compiler isn't quite smart enough to rule out that override.)

The compiler suggests writing this abomination, which does work:

let z: i32 = x + <i16 as Into<i32>>::into(y);

But at least you can write this:

let x: i32 = 1;
let y: i16 = 2;
let y32: i32 = y.into();
let z: i32 = x + y32;

[1]: https://en.cppreference.com/w/c/language/conversion

About type inference coming to the C language as well

Posted Dec 11, 2023 3:08 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

And, after posting this comment, I've realized that the reason into() doesn't work is because you're *supposed* to write this instead:

let z: i32 = x + i32::from(y);

Obviously I need to spend more time studying Rust, or maybe actually sit down and write a toy program in it.

Finally, I should note that you can write "y as i32", but that's less safe because it will silently do a narrowing conversion. from() and into() can only do conversions that never lose data, and there's also try_from()/try_into() if you want to handle overflow explicitly.

About type inference coming to the C language as well

Posted Dec 11, 2023 13:08 UTC (Mon) by gspr (subscriber, #91542) [Link]

> and there's also try_from()/try_into() if you want to handle overflow explicitly.

And there's try_from().expect("Conversion failure") for those cases where you wanna say "man, I don't really wanna think about this, and I'm sure the one type converts to the other without loss in all cases my program experiences – but if I did overlook something, then at least abort with an error message instead of introducing silent errors".

About type inference coming to the C language as well

Posted Dec 11, 2023 4:43 UTC (Mon) by swilmet (subscriber, #98424) [Link]

It's true that in C++ and Rust, types can be quite long to write.

Both C++ and Rust have a large core language, while C has a small core language.

I see Rust more as a successor to C++. C programmers in general - I think - like the fact that C has a small core language. So in C the types remain small to write, and there are more function calls instead of using sophisticated core language features. C is thus more verbose, and verbosity can be seen as an advantage.

Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

About type inference coming to the C language as well

Posted Dec 11, 2023 8:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> Maybe the solution is to create a SubC language: a subset of C that is safe (or at least safer). That's already partly the case with the compiler options, hardening efforts etc.

I disagree with this, assuming that "safe" means "cannot cause UB outside of an unsafe block." A safe version of C needs at least the following:

* Lifetimes and borrow checking, which implies a type annotation similar to generics.
* Type inference, or else you have to write lifetime annotations everywhere.
* Box<T> or something equivalent to Box<T>, or else you can't put big objects on the heap and move their ownership around.
* Arc<RwLock<T>> or some equivalent, or else you have no reasonable escape hatch from the borrow checker (other than unsafe blocks).
* Rc<RefCell<T>> or some equivalent, or else you have to use the multithreaded escape hatch even in single-threaded code.
* And then there are many other optimizations such as using Mutex<T> instead of RwLock<T>, or OnceCell<T> instead of RefCell<T>. All of these have valid equivalents in C, and should be possible to represent in our hypothesized "safe C" (without needing more than a minimal amount of unsafe, preferably buried somewhere in the stdlib so that "regular" code can be safe).

I just don't see how you provide all of that flexibility without doing monomorphization, at which point you're already 80% of the way to reinventing Rust.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:10 UTC (Mon) by Sesse (subscriber, #53779) [Link]

I guess that if you banned threads and pointers (presumably requiring lots of globals) and made all array access bounds-checked and all data zero-initialized, you could get a safe C subset without going there. How useful it would be would be a different thing...

About type inference coming to the C language as well

Posted Dec 11, 2023 13:52 UTC (Mon) by farnz (subscriber, #17727) [Link]

If you're not careful, you end up with something like Wuffs. A perfectly useful language in some domains, but deliberately limited in scope to stop you writing many classes of bug.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:55 UTC (Thu) by swilmet (subscriber, #98424) [Link]

Seems useful to write command-line programs, for example.

About type inference coming to the C language as well

Posted Dec 14, 2023 10:57 UTC (Thu) by farnz (subscriber, #17727) [Link]

You're not going to get very far when you can't access arguments, or do I/O. Wuffs is deliberately limited to not doing that, because it's dangerous to mix I/O with file format parsing.

About type inference coming to the C language as well

Posted Dec 11, 2023 11:35 UTC (Mon) by swilmet (subscriber, #98424) [Link]

I'm not an expert in programming languages design and security-related matters.

But why not trying a C-to-Rust transpiler? (random idea).

By keeping a small core language with the C syntax, and having a new standard library that looks like Rust but uses more function calls instead.

The transpiler would "take" the new stdlib as part of the language, for performance reasons, and translates the function calls to Rust idioms.

A source-to-source compiler is of course not ideal, but that's how C++ was created ("C with classes" was initially translated to C code).

About type inference coming to the C language as well

Posted Dec 11, 2023 12:09 UTC (Mon) by farnz (subscriber, #17727) [Link]

You might want to look at the C2Rust project; the issue is that a clean transpiler to Rust has to use unsafe liberally, since C constructs translate to something that can't be represented in purely Safe Rust.

The challenge then becomes adding something like lifetimes (so that you can translate pointers to Rust references instead of Rust raw pointers) without "bloating" C. I suspect that it's impossible to have a tiny core language without pushing many problems into the domain of "the programmer simply must not make any relevant mistakes"; note, though, that this is not bi-directional, since a language with a big core can still push many problems into that domain.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:32 UTC (Tue) by swilmet (subscriber, #98424) [Link]

I didn't know C2Rust, it shows that my random idea is not stupid after all :)

But I had the idea to convert (a subset of) C to _safe_ Rust, of course. Instead of some Rust keywords, operators etc (the core language), have C functions instead.

Actually the GLib/GObject project is looking to have Rust-like way of handling things, see:
https://www.bassi.io/articles/2023/08/23/the-mirror/
(but a bit long to read, and one needs to know the GObject world to understand the blog post I think).

Anyway, that's an interesting topic for researchers. Then making it useful and consumable for real-world C projects is yet another task.

About type inference coming to the C language as well

Posted Dec 12, 2023 10:43 UTC (Tue) by farnz (subscriber, #17727) [Link]

The hard part is not the keywords and operators - it's the lifetime annotation system. Lifetimes are a check on what the programmer intended, so have to be possible to write as an annotation to pointer types in the C derived language, but then to be usable force you to have a generics system (since you want many things to be generic over a lifetime) with (at least) covariance and invariance possible to express.

And once you have a generics system that can express covariance and invariance for each item in a set of generic parameters, why wouldn't you allow that to be used for types as well as lifetimes? At which point, you have Rust traits and structs, and most of the complexity of Rust.

About type inference coming to the C language as well

Posted Dec 12, 2023 11:34 UTC (Tue) by mb (subscriber, #50428) [Link]

>But I had the idea to convert (a subset of) C to _safe_ Rust, of course.

That is not possible, except for very trivial cases.

The C code does neither include enough information (e.g. lifetimes) for that to work, nor is it usually structured in a way for this to work.

Programming in Rust requires a different way of thinking and a different way of structuring your code. An automatic translation of the usual ideomatic C programs will fail so hard that it would be easier to rewrite it from scratch instead of translating it and then fixing the compile failures.

About type inference coming to the C language as well

Posted Dec 13, 2023 23:59 UTC (Wed) by swilmet (subscriber, #98424) [Link]

The C syntax alone is not enough, but comments with annotations can be added, and become part of the language.

I started to learn Rust but dislike the fact that it has many core features ("high-level ergonomics"). It's probably possible to use Rust in a simplistic way though, except maybe if a library forces to use the fancy features.

About type inference coming to the C language as well

Posted Dec 14, 2023 9:37 UTC (Thu) by farnz (subscriber, #17727) [Link]

You could avoid using those libraries, and limit yourself to libraries that have a "simple" enough interface for you (no_std libraries are a good thing to look for here, since they're designed with just core and maybe alloc in mind, not the whole of std) - bearing in mind that you don't need to care how those libraries are implemented if it's just about personal preference.

In general, though, I wouldn't be scared of a complex core language - all of that complexity has to be handled somewhere, and a complex core language can mean that complexity is being compiler-checked instead of human-checked.

About type inference coming to the C language as well

Posted Dec 14, 2023 11:07 UTC (Thu) by swilmet (subscriber, #98424) [Link]

The codebases that I maintain already use between two and three/four main programming languages (welcome to GNOME, I should say). At some point I wanted to write new code in Rust, but it means adding more complexity and being less productive for some time while learning the language.

"Soft"ware, they said :-)

About type inference coming to the C language as well

Posted Dec 10, 2023 12:03 UTC (Sun) by Wol (subscriber, #4433) [Link]

> In my opinion, type inference for variable declarations should be used only sparingly, when the type of the variable is already visible (and quite long to write) on the right-hand side of the assignment. Writing the types of variables explicitly enhance code comprehension.

Have a variable type of "infer"? That way, an undeclared variable is still an error, but you can explicitly tell the compiler to decide for itself :-)

Cheers,
Wol

Modern C for Fedora (and the world)

Posted Dec 10, 2023 19:59 UTC (Sun) by geert (subscriber, #98403) [Link]

It could have been fatal on m68k, too, as integer types are returned in register d0, and pointer types in register a0.
However, gcc still seems to add "move.l %a0,%d0" at the end of any function returning a pointer type.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK