Few lesser known tricks, quirks and features of C
source link: https://blog.joren.ga/less-known-c
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Few lesser known tricks, quirks and features of CFew lesser known tricks, quirks and features of C
While almost everybody learns at some point you can abuse C to achieve OOP in it, there are some tricks, quirks and features (some quite fundamental to the language!) which seems to throw even experienced developers off the track. Thus I did a sloppy job of gathering some of them in this post with even sloppier short explanation and/or examples (or quote of thereof).
Array pointers
Decay-to-pointer makes regular pointers to array usually not needed:
int arr[10];
int* ap0 = arr; // array decay-to-pointer
// ap0[2] = ...
int (*ap1)[10] = &arr; // proper pointer to array
// (*ap1)[2] = ...
But ability to allocate a big array on heap is nice:
int (*ap3)[900000] = malloc(sizeof *ap3);
With pointers even VLA can find its use (more here):
int (*ap4)[n] = malloc(sizeof *ap4);
Comma operator
The comma operator is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the right-most expression is considered.
For example: b = (a=3, a+2);
– this code would firstly assign value 3
to a
, and then a+2
would be assigned to variable b
. So, at the end,
b
would contain value 5 while variable a
would be 3.
On Wikipedia we can find few more examples:
Digraphs, trigraphs and alternative tokens
C code may not be portable, but the language itself is probably more portable than any other; there are system using e.g. EBCDIC encoding instead of ASCII, to support them C has digraphs and trigraphs – multi-character sequences treated by the compiler as other characters.
Digraph | Trigraph | Macro | |||||
---|---|---|---|---|---|---|---|
<: |
[ |
??= |
# |
and |
&& |
||
:> |
] |
??( |
[ |
and_eq |
&= |
||
<% |
{ |
??/ |
\ |
bitand |
& |
||
%> |
} |
??) |
] |
bitor |
| |
||
%: |
# |
??' |
^ |
compl |
~ |
||
%:%: |
## |
??< |
{ |
not |
! |
||
——– | ———– | ??! |
| |
not_eq |
!= |
||
——– | ———– | ??> |
} |
or |
|| |
||
——– | ———– | ??- |
~ |
or_eq |
|= |
||
——– | ———– | ——– | ———– | xor |
^ |
||
——– | ———– | ——– | ———– | xor_eq |
^= |
Despite there being small opposition, the C Standard Committee decided to remove support for trigraphs from C23.
Designated initializer
These allow you to specify which elements of an object (array, structure, union) are to be initialized by the values following. The order does not matter!
struct Foo {
int x, y;
const char* bar;
};
void f(void)
{
int arr[] = { 1, 2, [5] = 9, [9] = 5, [8] = 8 };
struct Foo f = { .y = 23, .bar = "barman", .x = -38 };
struct Foo arr[] = {
[10] = { 8, 8, 9 },
[8] = { 1, 8, bar3 },
[12] = { .x = 9, .z = 8 },
};
}
Compound literals
A compound literal looks like a cast of a brace-enclosed initializer list. Its value is an object of the type specified in the cast, containing the elements specified in the initializer.
#include <stdio.h>
struct Foo { int x, y; };
void bar(struct Foo p)
{
printf("%d, %d", p.x, p.y);
}
int main(void)
{
bar((struct Foo){2, 3});
return 0;
}
Compound literals are lvalues
(struct Foo){};
((struct Foo){}).x = 4;
&(struct Foo){};
Multi-character constants
They are implementation dependent and even the standard itself to usually
best avoid them. That being said, using them as self-documenting enum
s
can be quite handy when you may need to deal with raw memory dumps later on.
enum state {
waiting = 'WAIT',
running = 'RUN!',
stopped = 'STOP',
};
For example, on my machine I could localize 'WAIT'
like here:
00001120: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 .ff...........@.
00001130: f3 0f 1e fa e9 67 ff ff ff 55 48 89 e5 48 83 ec .....g...UH..H..
00001140: 10 c7 45 fc 54 49 41 57 8b 45 fc 89 c6 48 8d 05 ..E.TIAW.E...H..
00001150: b0 0e 00 00 48 89 c7 b8 00 00 00 00 e8 cf fe ff ....H...........
00001160: ff b8 00 00 00 00 c9 c3 f3 0f 1e fa 48 83 ec 08 ............H...
Bit fields
Declares a member with explicit width, in bits. Adjacent bit field members may be packed to share and straddle the individual bytes.
struct cat {
unsigned int legs : 3; // 3 bits for legs (0-4 fit in 3 bits)
unsigned int lives : 4; // 4 bits for lives (0-9 fit in 4 bits)
};
0 bit fields
An example taken from the SO answer:
I discovered recently 0 bit fields.
struct { int a : 3; int b : 2; int : 0; int c : 4; int d : 3; };
which will give a layout of:
000aaabb 0ccccddd
instead of without the
:0
:
0000aaab bccccddd
The 0 width field tells that the following bit fields should be set on the next atomic entity (
char
).
volatile
type qualifier
This qualifier tells the compiler that a variable may be accessed by other means than the current code (e.g. by code run in another thread or it's MMIO device), thus to not optimize away reads and writes to this resource.
restrict
type qualifier
By adding this type qualifier, a programmer hints to the compiler that for the lifetime of the pointer, no other pointer will be used to access the object to which it points. This allows the compiler to make optimizations (for example, vectorization) that would not otherwise have been possible.
register
type qualifier
It suggests that the compiler stores a declared variable in a CPU register
(or some other faster location) instead of in random-access memory.
The location of a variable declared with this qualifier cannot be accessed
(but the sizeof
operator can be applied).
Nowadays register
is usually meaningless as modern compilers place variables
in a register if appropriate regardless of whether the hint is given. Sometimes
may it be useful on embedded systems, but even then compiler will probably
provide better optimizations.
Flexible array member
From Wikipedia:
struct vectord {
short len; // there must be at least one other data member
double arr[]; // the flexible array member must be last
// The compiler may reserve extra padding space here,
// like it can between struct members.
};
struct vectord *vector = malloc(...);
vector->len = ...;
for (int i = 0; i < vector->len; ++i) {
vector->arr[i] = ...; // transparently uses the right type (double)
}
%n
format specifier
This StackOverflow answer presents it reasonably well:
%n
returns the current position of the imaginary cursor used whenprintf()
formats its output.int pos1, pos2; const char* str_of_unknown_len = "we don't care about the length of this"; printf("Write text of unknown %n(%s)%n length\n", &pos1, str_of_unknown_len, &pos2); printf("%*s\\%*s/\n", pos1, " ", pos2-pos1-2, " "); printf("%*s", pos1+1, " "); for (int i = pos1+1; i < pos2-1; ++i) { putc('-', stdout); } putc('\n', stdout);
will have following output
Write text of unknown (we don't care about the length of this) length \ / --------------------------------------
Granted a little bit contrived but can have some uses when making pretty reports.
Interlacing syntactic constructs
The following is syntactically correct C code:
#include <stdio.h>
int main()
{
int n = 3;
int i = 0;
switch (n % 2) {
case 0:
do {
++i;
case 1:
++i;
} while (--n > 0);
}
printf("%d\n", i); // 5
}
I know goto
phobic programmers using it like this:
switch (x) {
case 1:
// 1 specific code
if (0) {
case 2:
// 2 specific code
}
// common for 1 and 2
}
The most famous usage of this quirk/"feature" is Duff's device:
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Constant string concatenation
You don't need sprintf()
(nor strcat()
!) to concatenate strings literals:
#define WORLD "World!"
const char* s = "Hello " WORLD "\n"
"It's a lovely day, "
"innit?";
Using &&
and ||
as conditionals
If you write Shell scripts, you know what I mean.
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
1 && puts("Hello");
0 && puts("I won't");
1 && puts("World!");
0 && puts("be printed");
1 || puts("I won't be printed either");
0 || puts("But I will!");
true && (9 > 2) && puts("9 is bigger than 2");
isdigit('9') && puts("9 is a digit");
isdigit('n') && puts("n is a digit") || puts("n is NOT a digit!");
return 0;
}
The compiler will probably scream warnings at you as it's really uncommon to do this in C code.
Compile-time assumption-checking using enum
s
#define D 1
#define DD 2
enum CompileTimeCheck
{
MAKE_SURE_DD_IS_TWICE_D = 1/(2*(D) == (DD)),
MAKE_SURE_DD_IS_POW2 = 1/((((DD) - 1) & (DD)) == 0)
};
Can be useful for libraries with compile-time configurable constants.
Ad hoc struct
declaration in the return type of a function
You can define struct
s in very (at first glance) random places:
#include <stdio.h>
struct Foo { int a, b, c; } make_foo(void) {
struct Foo ret = { .c = 3 };
ret.a = 11 + ret.c;
ret.b = ret.a * 3;
return ret;
}
int main()
{
struct Foo x = make_foo();
printf("%d\n", x.a + x.b + x.c);
return 0;
}
"Nested" struct
definition is not kept nested
#include <stdio.h>
struct Foo {
int x;
struct Bar {
int y;
};
};
int main()
{
struct Bar s = { 34 }; // correct
// struct Foo.Bar s; // wrong
printf("%d\n", s.y);
return 0;
}
Flat Initializer Lists
int arr[3][3] = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
// = { {1,2,3}, {4,5,6}, {7,8,9} };
struct Foo {
const char *name;
int age;
};
struct Foo records[] = {
"John", 20,
"Bertha", 40,
"Andrew", 30,
};
Static array indices in function parameter declarations
Except in certain contexts, an unsubscripted array name (for example,
region
instead ofregion[4]
) represents a pointer whose value is the address of the first element of the array, provided that the array has previously been declared. An array type in the parameter list of a function is also converted to the corresponding pointer type. Information about the size of the argument array is lost when the array is accessed from within the function body.To preserve this information, which is useful for optimization, C99 allows you to declare the index of the argument array using the static keyword. The constant expression specifies the minimum pointer size that can be used as an assumption for optimizations. This particular usage of the static keyword is highly prescribed. The keyword may only appear in the outermost array type derivation and only in function parameter declarations. If the caller of the function does not abide by these restrictions, the behavior is undefined.
The following examples show how the feature can be used.
int n; void foo(int arr[static 10]); // arr points to the first of at least 10 ints void foo(int arr[const 10]); // arr is a const pointer void foo(int arr[const]); // const pointer to int void foo(int arr[static const n]); // arr points to at least n ints (VLA)
void foo(int p[static 1]);
is effectively a standard
way to declare that p
must be non-null pointer.
Macro Overloading by Argument List Length
#include <stdio.h>
#include "cmoball.h"
#define NoA(...) CMOBALL(FOO, __VA_ARGS__)
#define FOO_3(x,y,z) "Three"
#define FOO_2(x,y) "Two"
#define FOO_1(x) "One"
#define FOO_0() "Zero"
int main()
{
puts(NoA());
puts(NoA(1));
puts(NoA(1,1));
puts(NoA(1,1,1));
return 0;
}
Function types
Function pointers ought to be well known, but as we know the syntax is bit awkward.
On the other hand, less people know you can (as with most objects in C) create
a typedef
for function type.
#include <stdio.h>
int main()
{
typedef double fun_t(double);
fun_t sin, cos, sqrt;
fun_t* ftpt = &sqrt;
printf("%lf\n", ftpt(4)); // 2.000000
return 0;
}
X-Macros
Named function parameters
struct _foo_args {
int num;
const char* text;
};
#define foo(...) _foo((struct _foo_args){ __VA_ARGS__ })
int _foo(struct _foo_args args)
{
puts(args.text);
return args.num * 2;
}
int main(void)
{
int result = foo(.text = "Hello!", .num = 8);
return 0;
}
Combining default, named and positional arguments
Using compound literals and macros to create named arguments (…):
typedef struct { int a,b,c,d; } FooParam; #define foo(...) foo((FooParam){ __VA_ARGS__ }) void (foo)(FooParam p);
adding default arguments is also quite easy:
#define foo(...) foo((FooParam){ .a=1, .b=2, .c=3, .d=4, __VA_ARGS__})
But now positional arguments don't work anymore, and there may be situations where you want to support both options. But I recently realized, that you can make them work by adding a dummy parameter:
typedef struct { int _; int a,b,c,d; } FooParam; #define foo(...) foo((FooParam){ .a=1, .b=2, .c=3, .d=4, ._=0, __VA_ARGS__})
Now, foo can be called in the following ways:
foo(); // a=1, b=2, c=3, d=4 foo(.a=4, .b=5); // a=4, b=5, c=3, d=5 foo(4, 5); // a=4, b=5, c=3, d=5 foo(4, 5, .d=8); // a=4, b=5, c=3, d=8
The dummy parameter isn't needed when you have arguments that are required to be passed by name:
typedef struct { int alwaysNamed; int a,b,c,d; } FooParam; #define foo(...) foo((FooParam){.a=1,.b=2,.c=3,.d=4, .alwaysNamed=5, __VA_ARGS__})
Abusing unions for grouping things into namespaces
Suppose that you have a
struct
with a bunch of fields, and you want to deal with some of them all together at once under a single name; perhaps you want to conveniently copy them as a block throughstruct
assignment.By using unions you can access both
a.field2
anda.sub
(anda.field2
is the same asa.sub.field2
) without any macros.struct a { int field1; union { struct { int field2; int field3; }; struct { int field2; int field3; } sub; }; };
2023-02-19Index Copyright © Jorengarenar RSS
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK