How x86_64 addresses memory
source link: https://blog.yossarian.net/2020/06/13/How-x86_64-addresses-memory
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Jun 13, 2020
Tags:programming
Today I’m going to write up one small (and yet still remarkably complicated) fragment of x86_64’s instruction semantics: memory addressing.
Specifically, I’m going to write up the different ways in which x86_64 allows the user to address
memory via just one
instruction: mov
.
I won’t attempt to cover other instructions that can touch memory (which is pretty much all of them,
thanks CISC), ones that write massive chunks of memory (looking at you, fxsave
), or any adjacent
subjects (code models, position independent code, binary relocations). I also won’t even try to
cover historical addressing modes or modes that work when an x86_64 processor isn’t in 64-bit mode
(i.e., any modes other than long mode with 64-bit code).
Some constraints
Despite (or perhaps thanks to?) the legacy hell that is x86_64’s instruction encoding, there are some constraints on how memory is addressed.
First, the good news:
- At a high enough level, there are really only two addressing modes on x86_64.
- Both addressing modes require all registers to be the same size as each other. In order words, we can’t do something weird like mixing 64, 32, and 16-bit registers to produce an effective address — there simply isn’t room in the x86_64 encoding to do so.
Now, the bad news:
- One of those addressing modes is still stupidly complicated.
-
All registers have to be the same size as each each other but don’t
have to
be the same as the processor mode. In particular, we can use 32-bit registers instead
of 64-bit ones by including the address prefix byte (
0x67
) in our encoding.
“Scale-Index-Base-Displacement” addressing
I call this mode “Scale-Index-Base-Displacement” because I have no idea what else to call it.
As far as I can tell, neither Intel nor AMD actually considers this to be a singular mode; instead, they refer to it as a general collection of related modes with a wide variety of different encodings.
But we’re not talking about encodings today: we’re talking about semantics, and semantically each of these related modes falls back to some combination of four parameters:
- Scale : A 2-bit constant factor that is either 1, 2, 4, or 8.
-
Index
: Any general purpose register (
rax
,rbx
, &c). - Base : Any general purpose register.
- Displacement : An integral offset. This is normally limited to 32 bits even in 64-bit mode but can be 64-bits with a few select encodings. More on that later.
Various combinations of the four (including all four) are valid. Here are the valid combinations, in roughly increasing order of complexity:
Displacement Base Base + Index Base + Displacement Base + Index + Displacement Base + (Index * Scale) (Index * Scale) + Displacement Base + (Index * Scale) + Displacement
Let’s go through them one by one.
Displacement
This is arguably the simplest addressing mechanism in the x86 family: the displacement field is treated as an absolute memory address.
Unfortunately, it’s also almost completely useless on x86_64. Remember that note about displacements almost always being 32 bits? That means you can’t represent an absolute address, since an absolute x86_64 address is 64 bits (really 48, but whatever) and just won’t fit in the displacement.
There’s one
exception to this: x86_64 allows for a 64-bit displacement with the a*
registers.
In Intel syntax:
; store the qword at 0x00000000000000ff into rax mov rax, [0xff] ; store the dword at 0x00000000000000ff into eax mov eax, [0xff] ; store the word at 0x00000000000000ff into ax mov ax, [0xff] ; store the byte at 0x00000000000000ff into al mov al, [0xff]
gas
(the GNU assembler) refers to these as movabs
in both 32-bit and 64-bit modes.
Why would I (or my compiler) use this mode?
First of all, for code model reasons that aren’t relevant to this post. Eli Bendersky has a fantastic blog post on those.
More concretely: most programs have at least a few static addresses that are determined at compile-time, like global variables.
For example, this trivial program:
long x = 100; long foo() { long *y = &x; return *y; }
…yields:
push rbp mov rbp, rsp movabs rax, offset x ; here! mov qword ptr [rbp - 8], rax mov rax, qword ptr [rbp - 8] mov rax, qword ptr [rax] pop rbp ret
(View it on Godbolt .)
Base
Addressing via the base register adds one layer of indirection over absolute addressing: instead of an absolute address encoded into the instruction’s displacement field, an address is loaded from the specified general-purpose register ( any GPR! Hooray!).
This indirection allows us to do absolute addressing with an arbitrary destination register via the following pattern:
; store the immediate (not displacement) into rbx mov rbx, 0xacabacabacabacab ; store the qword at the address stored in rbx into rcx mov rcx, [rbx]
…but we have relatively few reasons to do that, given the richer addressing modes we’re about to see.
Why would I (or my compiler) use this mode?
Because sometimes we have a calculated address already lying around from another operation, and we just want to use it.
The disassembly from the displacement sample above has a good example of this as well:
mov rax, qword ptr [rax]
Base + Index
This is just like addressing via the base register, except that we also add in the value of the index register.
For example:
; store the qword in rcx into the memory address computed ; as the sum of the values in rax and rbx mov [rax + rbx], rcx
Why would I (or my compiler) use this mode?
I had a hard time contriving an example for this, which of course means that my coworkers immediately found one:
int foo(char * buf, int index) { return buf[index]; }
…which yields:
push rbp mov rbp, rsp mov qword ptr [rbp - 8], rdi mov dword ptr [rbp - 12], esi mov rax, qword ptr [rbp - 8] ; rax is buf movsxd rcx, dword ptr [rbp - 12] ; rcx is index movsx eax, byte ptr [rax + rcx] ; store buf[index] into eax pop rbp ret
(View it on Godbolt .)
This is obvious in retrospect: Base + Index
is perfect for modeling array
accesses where neither the array’s starting address nor the offset into the
array is fixed at compile-time.
Base + Displacement
More indirection! In case you haven’t guessed it, calculating the effective address with both the base register and the displacement field corresponds to two operations:
- We load the value stored in the base register
- Adding the loaded value to the value of the displacement field
Then, we take that sum and use it as our effective address. By way of example:
; add 0xcafe to the value stored in rax ; then, store the qword at the computed address into rbx mov rbx, [rax + 0xcafe]
Why would I (or my compiler) use this mode?
As we’ve seen with Base + Index
, some addressing modes naturally reflect C-like
array semantics.
Base + Displacement
can be thought of in a similar manner, but for structure semantics:
the base
register holds the address to the beginning of the structure, and the displacement
field holds the fixed offset into that structure.
For example, the following:
struct foo { long a; long b; }; long bar(struct foo *foobar) { return foobar->b; }
assembles as:
push rbp mov rbp, rsp mov qword ptr [rbp - 8], rdi mov rax, qword ptr [rbp - 8] ; rax is foobar mov rax, qword ptr [rax + 8] ; rax + 8 is foobar->b; store back into rax pop rbp ret
(View it on Godbolt .)
This also makes sense if you think about the stack construction and layout at the beginning of
every function as a custom structure: accesses like [rbp - N]
are basically stack->objN
.
Base + Index + Displacement
If the last mode makes sense to you, then this one is the logical next step: it’s semantically identical, except that we also add the value of the index register.
Just as above, but with one more register:
; add 0xcafe to the values stores in rax and rcx ; then, store the qword at the computer address into rbx mov rbx, [rax + rcx + 0xcafe]
Why would I (or my compiler) use this mode?
Just as Base + Index
naturally models an array access and Base + Displacement
naturally models
structure access, Base + Index + Displacement
naturally models structure access within an array!
I had a hard time getting clang
to emit one of these on Godbolt, but eventually
got one with -O1
:
struct foo { long a; long b; }; long square(struct foo foos[], long i) { struct foo x = foos[i]; return x.b; }
assembles to the very terse:
shl rsi, 4 mov rax, qword ptr [rdi + rsi + 8] ; rdi is foos, rsi is i, 8 is the field offset ret
(View it on Godbolt .)
Base + (Index * Scale)
Our first multiplication!
The scale
field is like displacement
in that it’s a constant factor that’s encoded into
our instruction. Unlike displacement
, however, scale
is extremely constrained: it’s only
two bits wide, meaning that it can only be 1 of 4 possible values: 1
, 2
, 4
, or 8
.
As the name implies, the scale field is used to scale (i.e., multiply) another field. In particular, it always scales the index register — scale cannot be used without index .
Why would I (or my compiler) use this mode?
Among many other things, Base + (Index * Scale)
naturally models accesses into an array
of pointers (distinct from an array of laid-out structures, like above!):
struct foo { long a; long b; }; long bar(struct foo *foos[], long i) { struct foo *x = foos[i]; return x->b; }
assembles to:
mov rax, qword ptr [rdi + 8*rsi] ; rdi is foos, rsi is i, 8 is the scale (pointer-sized!) mov rax, qword ptr [rax + 8] ret
(View it on Godbolt .)
(Index * Scale) + Displacement
Let’s keep going. This is almost identical to the last mode, except that we’ve swapped the base register out for the displacement field. No particular complexity there.
Why would I (or my compiler) use this mode?
(Index * Scale) + Displacement
naturally models a specialized case of array access:
when the array is statically addressable (e.g., a global) and the element size is
computable via the scale
.
For example:
int tbl[10]; int foo(int i) { return tbl[i]; }
assembles to:
movsxd rax, edi mov eax, dword ptr [4*rax + tbl] ; rax is i, 4 is the scale (sizeof(int) == 4) ret
(View it on Godbolt .)
Base + (Index * Scale) + Displacement
Now we’re cooking with gas. This is the final and most complex x86_64 addressing form, but there’s absolutely nothing conceptually special about it: it’s just one more arithmetic operation on top of the three-parameter addressing modes.
Why would I (or my compiler) use this mode?
Base + (Index * Scale) + Displacement
naturally models a two-dimensional array access:
long tbl[10][10]; long foo(long i, long j) { return tbl[i][j]; }
assembles to:
lea rax, [rdi + 4*rdi] shl rax, 4 mov rax, qword ptr [rax + 8*rsi + tbl] ret
(View it on Godbolt .)
RIP-relative addressing
The addressing mode documented above is almost identical to its historical x86_32 equivalent — its biggest changes are allowing 64-bit GPRs and (sometimes) 64-bit displacements.
Where x86_64 really diverges is in its addition of a brand new addressing mode, best known as “RIP-relative” addressing.
Why is it called “RIP-relative”? Because it encodes a displacement relative to the RIP register’s
value (specifically the RIP of the next
instruction, not the current one). This is usually
represented with the familiar [Base + Displacement]
syntax, except that the base
register is
now rip
instead of a GPR:
mov rax, [rip + 16]
Why would I (or my compiler) use this mode?
For reasons that I originally said that I wouldn’t go into in this blog post: position-independent code and code models.
We’ll make a brief exception: using RIP-relative addressing makes position-independent code smaller and simpler, and is a natural fit for the “small” (and default) code model, where all code and data needs to be addressable within a 32-bit offset.
For example, the following when compiled with -O1
and -fpic
:
long tbl[10]; int foo(int i) { return tbl[i]; }
requires just two mov
s on x86_64:
foo: mov rax, qword ptr [rip + tbl@GOTPCREL] mov rax, qword ptr [rax + 8*rdi] ret
…but three and some additional boilerplate on x86_32:
foo: call .L0$pb .L0$pb: pop eax .Ltmp0: add eax, offset _GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb) mov ecx, dword ptr [esp + 4] mov eax, dword ptr [eax + tbl@GOT] mov eax, dword ptr [eax + 4*ecx] ret
One last catch: segmentation
x86_64 almost killed segmentation. Almost. Segment registers are no longer necessary thanks to the flat address space, but they still show up in a few places:
-
Linux (really glibc) uses
fs
in userspace to access the TLS segments configured by the kernel. You can find these segments specified in the per-CPU GDT configuration .gs
appears free for use in userspace , assuming something else in glibc (or whatever libc you use) doesn’t use it. -
Linux uses
gs
in kernelspace to store the base address for the per-CPU variable region. We can see this in the macro definition ofPER_CPU_VAR
:#define PER_CPU_VAR(var) %__percpu_seg:var
which, on x86_64, expands to:
%gs:var
So, unfortunately, we still need to care about these. The good news is that caring about them isn’t too bad: they essentially boil down to adding the value in the segment register to the rest of the address calculation.
By way of example with a thread-local variable:
int __thread x = 0; int foo(void) { int *y = &x; return *y; }
assembles to:
push rbp mov rbp, rsp mov rax, qword ptr fs:[0] ; grab the base address of the thread-local storage area lea rax, [rax + x@TPOFF] ; calculate the effective address of x within the TLS mov qword ptr [rbp - 8], rax ; store the address of x into y mov rax, qword ptr [rbp - 8] mov eax, dword ptr [rax] pop rbp ret
(View it on Godbolt .)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK