Legitimate Use of Variable Length Arrays
source link: https://www.tuicool.com/articles/N3MNF3j
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The C99 (ISO/IEC 9899:1999) standard of C introduced a new, powerful
feature called Variable Length Arrays (VLAs). The size of an array with
automatic storage duration (i.e. stack allocated) can be determined at
run time. Each instance of the array may even have a different length.
Unlike alloca()
, they’re a sanctioned form of dynamic stack
allocation.
At first glance, VLAs seem convenient, useful, and efficient. Heap allocations have a small cost because the allocator needs to do some work to find or request some free memory, and typically the operation must be synchronized since there may be other threads also making allocations. Stack allocations are trivial and fast by comparison: Allocation is a matter of bumping the stack pointer, and no synchronization is needed.
For example, here’s a function that non-destructively finds the median of a buffer of floats:
/* note: nmemb must be non-zero */ float median(const float *a, size_t nmemb) { float copy[nmemb]; memcpy(copy, a, sizeof(a[0]) * nmemb); qsort(copy, nmemb, sizeof(copy[0]), floatcmp); return copy[nmemb / 2]; }
It uses a VLA, copy
, as a temporary copy of the input for sorting. The
function doesn’t know at compile time how big the input will be, so it
cannot just use a fixed size. With a VLA, it efficiently allocates
exactly as much memory as needed on the stack.
Well, sort of. If nmemb
is too large, then the VLA will silently
overflow the stack. By silent I mean that the program has no way to
detect it and avoid it. In practice, it can be a lot louder, from a
segmentation fault in the best case, to an exploitable vulnerability in
the worst case:
stack clashing
. If an attacker can control nmemb
, they might choose a value that causes copy
to overlap with
other allocations, giving them control over those values as well.
If there’s any risk that nmemb
is too large, it must be guarded.
#define COPY_MAX 4096 float median(const float *a, size_t nmemb) { if (nmemb > COPY_MAX) abort(); /* or whatever */ float copy[nmemb]; memcpy(copy, a, sizeof(a[0]) * nmemb); qsort(copy, nmemb, sizeof(copy[0]), floatcmp); return copy[nmemb / 2]; }
However, if median
is expected to safely accommodate COPY_MAX
elements, it may as well always
allocate an array of this size. If it
can’t, then that’s not a safe maximum.
float median(const float *a, size_t nmemb) { if (nmemb > COPY_MAX) abort(); float copy[COPY_MAX]; memcpy(copy, a, sizeof(a[0]) * nmemb); qsort(copy, nmemb, sizeof(copy[0]), floatcmp); return copy[nmemb / 2]; }
And rather than abort, you might still want to support arbitrary input sizes:
float median(const float *a, size_t nmemb) { float buf[COPY_MAX]; float *copy = buf; if (nmemb > COPY_MAX) copy = malloc(sizeof(a[0]) * nmemb); memcpy(copy, a, sizeof(a[0]) * nmemb); qsort(copy, nmemb, sizeof(copy[0]), floatcmp); float result = copy[nmemb / 2]; if (copy != buf) free(copy); return result; }
Then small inputs are fast, but large inputs still work correctly. This is called small size optimization .
If the correct solution ultimately didn’t use a VLA, then what good are they? In general, VLAs not useful. They’re time bombs . VLAs are nearly always the wrong choice. You must be careul to check that they don’t exceed some safe maximum, and there’s no reason not to always use the maximum. This problem was realized for the C11 standard (ISO/IEC 9899:2011) where VLAs were made optional. A program containing a VLA will not necessarily compile on a C11 compiler.
Some purists also object to a special exception required for VLAs: The sizeof
operator may evaluate its operand, and so it does not always
evaluate to compile-time constant. If the operand contains a VLA, then
the result depends on a run-time value.
Because they’re optional, it’s best to avoid even trivial VLAs like this:
float median(const float *a, size_t nmemb) { int max = 4096; if (nmemb > max) abort(); float copy[max]; memcpy(copy, a, sizeof(a[0]) * nmemb); qsort(copy, nmemb, sizeof(copy[0]), floatcmp); return copy[nmemb / 2]; }
It’s easy to prove that the array length is always 4096, but technically
this is still a VLA. That would still be true even if max
were const
int
, because the array length still isn’t a constant integral
expression.
VLA overhead
Finally, there’s also the problem that VLAs just aren’t as efficient as you might hope. A function that does dynamic stack allocation requires additional stack management. It must track additional memory addresses and will require extra instructions.
void fixed(int n) { if (n <= 1<<14) { volatile char buf[1<<14]; buf[n - 1] = 0; } } void dynamic(int n) { if (n <= 1<<14) { volatile char buf[n]; buf[n - 1] = 0; } }
Compiled with gcc -Os
and viewed with objdump -d -Mintel
:
0000000000000000 <fixed>: 0: 81 ff 00 40 00 00 cmp edi,0x4000 6: 7f 19 jg 21 <fixed+0x21> 8: ff cf dec edi a: 48 81 ec 88 3f 00 00 sub rsp,0x3f88 11: 48 63 ff movsxd rdi,edi 14: c6 44 3c 88 00 mov BYTE PTR [rsp+rdi*1-0x78],0x0 19: 48 81 c4 88 3f 00 00 add rsp,0x3f88 20: c3 ret 21: c3 ret 0000000000000022 <dynamic>: 22: 81 ff 00 40 00 00 cmp edi,0x4000 28: 7f 23 jg 4d <dynamic+0x2b> 2a: 55 push rbp 2b: 48 63 c7 movsxd rax,edi 2e: ff cf dec edi 30: 48 83 c0 0f add rax,0xf 34: 48 63 ff movsxd rdi,edi 37: 48 83 e0 f0 and rax,0xfffffffffffffff0 3b: 48 89 e5 mov rbp,rsp 3e: 48 89 e2 mov rdx,rsp 41: 48 29 c4 sub rsp,rax 44: c6 04 3c 00 mov BYTE PTR [rsp+rdi*1],0x0 48: 48 89 d4 mov rsp,rdx 4b: c9 leave 4c: c3 ret 4d: c3 ret
Note the use of a base pointer, rbp
and leave
, in the second
function in order to dynamically track the stack frame. (Hmm, in both
cases GCC could easily shave off the extra ret
at the end of each
function. Missed optimization?)
The story is even worse when stack clash protection is enabled
( -fstack-clash-protection
). The compiler generates extra code to probe
every page of allocation in case one of those pages is a guard page.
That’s also more complex when the allocation is dynamic. The VLA version
more than doubles in size (from 44 bytes to 101 bytes)!
Safe and useful variable length arrays
There is one convenient, useful, and safe form of VLAs: a pointer to a VLA. It’s convenient and useful because it makes some expressions simpler. It’s safe because there’s no arbitrary stack allocation.
Pointers to arrays are a rare sight in C code, whether variable length or not. That’s because, the vast majority of the time, C programmers implicitly rely on array decay : arrays quietly “decay” into pointers to their first element the moment you do almost anything with them. Also because they’re really awkward to use.
For example, the function sum3
takes a pointer to an array of exactly
three elements.
int sum3(int (*array)[3]) { return (*array)[0] + (*array)[1] + (*array)[2]; }
The parentheses are necessary because, without them, array
would be an
array of pointers — a type far more common than a pointer to an array.
To index into the array, first the pointer to the array must be
dereferenced to the array value itself, then this intermediate array is
indexed triggering array decay. Conceptually there’s quite a bit to it,
but, in practice, it’s all as efficient as the conventional approach to sum3
that accepts a plain int *
.
The caller must take the address of an array of exactly the right length:
int buf[] = {1, 2, 4}; int r = sum3(&buf);
Or if dynamically allocating the array:
int (*array)[3] = malloc(sizeof(*array)); (*array)[0] = 1; (*array)[1] = 2; (*array)[2] = 4; int r = sum3(array); free(array);
The mandatory parentheses and strict type requirements make this awkward and rarely useful. However, with VLAs perhaps it’s worth the trouble! Consider an NxN matrix expressed using a pointer to a VLA:
int n = /* run-time value */; /* TODO: Check for integer overflow? See note. */ float (*identity)[n][n] = malloc(sizeof(*identity)); if (identity) { for (int y = 0; y < n; y++) { for (int x = 0; x < n; x++) { (*identity)[y][x] = x == y; } } }
When indexing, the parentheses are weird, but the indices have the
convenient [y][x]
format. The non-VLA alternative is to compute a 1D
index manually from 2D indices ( y*n+x
):
int n = /* run-time value */; /* TODO: Check for integer overflow. */ float *identity = malloc(sizeof(*identity) * n * n); if (identity) { for (int y = 0; y < n; y++) { for (int x = 0; x < n; x++) { identity[y*n + x] = x == y; } } }
Note: What’s the behavior in the VLA version when n
is so large that sizeof(*identity)
doesn’t fit in a size_t
? I couldn’t find anything
in the standard about it. Both GCC and Clang both implicitly check for
overflow and, when it occurs, the expression evaluates to zero. The
undefined behavior sanitizer doesn’t complain when this happens.
So VLAs might be worth the trouble when using pointers to multi-dimensional, dynamically-allocated arrays. However, I’m still judicious about their use due to reduced portability. As a practical example, MSVC famously does not, and likely will never will, support VLAs.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK