6

How x86_64 addresses memory

 4 years ago
source link: https://blog.yossarian.net/2020/06/13/How-x86_64-addresses-memory
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Jun 13, 2020

Tags:programming

Today I’m going to write up one small (and yet still remarkably complicated) fragment of x86_64’s instruction semantics: memory addressing.

Specifically, I’m going to write up the different ways in which x86_64 allows the user to address memory via just one instruction: mov .

I won’t attempt to cover other instructions that can touch memory (which is pretty much all of them, thanks CISC), ones that write massive chunks of memory (looking at you, fxsave ), or any adjacent subjects (code models, position independent code, binary relocations). I also won’t even try to cover historical addressing modes or modes that work when an x86_64 processor isn’t in 64-bit mode (i.e., any modes other than long mode with 64-bit code).

Some constraints

Despite (or perhaps thanks to?) the legacy hell that is x86_64’s instruction encoding, there are some constraints on how memory is addressed.

First, the good news:

  • At a high enough level, there are really only two addressing modes on x86_64.
  • Both addressing modes require all registers to be the same size as each other. In order words, we can’t do something weird like mixing 64, 32, and 16-bit registers to produce an effective address — there simply isn’t room in the x86_64 encoding to do so.

Now, the bad news:

  • One of those addressing modes is still stupidly complicated.
  • All registers have to be the same size as each each other but don’t have to be the same as the processor mode. In particular, we can use 32-bit registers instead of 64-bit ones by including the address prefix byte ( 0x67 ) in our encoding.

“Scale-Index-Base-Displacement” addressing

I call this mode “Scale-Index-Base-Displacement” because I have no idea what else to call it.

As far as I can tell, neither Intel nor AMD actually considers this to be a singular mode; instead, they refer to it as a general collection of related modes with a wide variety of different encodings.

But we’re not talking about encodings today: we’re talking about semantics, and semantically each of these related modes falls back to some combination of four parameters:

  • Scale : A 2-bit constant factor that is either 1, 2, 4, or 8.
  • Index : Any general purpose register ( rax , rbx , &c).
  • Base : Any general purpose register.
  • Displacement : An integral offset. This is normally limited to 32 bits even in 64-bit mode but can be 64-bits with a few select encodings. More on that later.

Various combinations of the four (including all four) are valid. Here are the valid combinations, in roughly increasing order of complexity:

Displacement
Base
Base + Index
Base + Displacement
Base + Index + Displacement
Base + (Index * Scale)
(Index * Scale) + Displacement
Base + (Index * Scale) + Displacement

Let’s go through them one by one.

Displacement

This is arguably the simplest addressing mechanism in the x86 family: the displacement field is treated as an absolute memory address.

Unfortunately, it’s also almost completely useless on x86_64. Remember that note about displacements almost always being 32 bits? That means you can’t represent an absolute address, since an absolute x86_64 address is 64 bits (really 48, but whatever) and just won’t fit in the displacement.

There’s one exception to this: x86_64 allows for a 64-bit displacement with the a* registers.

In Intel syntax:

; store the qword at 0x00000000000000ff into rax
mov rax, [0xff]
; store the dword at 0x00000000000000ff into eax
mov eax, [0xff]
; store the word at 0x00000000000000ff into ax
mov ax, [0xff]
; store the byte at 0x00000000000000ff into al
mov al, [0xff]

gas (the GNU assembler) refers to these as movabs in both 32-bit and 64-bit modes.

Why would I (or my compiler) use this mode?

First of all, for code model reasons that aren’t relevant to this post. Eli Bendersky has a fantastic blog post on those.

More concretely: most programs have at least a few static addresses that are determined at compile-time, like global variables.

For example, this trivial program:

long x = 100;

long foo() {
  long *y = &x;
  return *y;
}

…yields:

push    rbp
mov     rbp, rsp
movabs  rax, offset x ; here!
mov     qword ptr [rbp - 8], rax
mov     rax, qword ptr [rbp - 8]
mov     rax, qword ptr [rax]
pop     rbp
ret

(View it on Godbolt .)

Base

Addressing via the base register adds one layer of indirection over absolute addressing: instead of an absolute address encoded into the instruction’s displacement field, an address is loaded from the specified general-purpose register ( any GPR! Hooray!).

This indirection allows us to do absolute addressing with an arbitrary destination register via the following pattern:

; store the immediate (not displacement) into rbx
mov rbx, 0xacabacabacabacab

; store the qword at the address stored in rbx into rcx
mov rcx, [rbx]

…but we have relatively few reasons to do that, given the richer addressing modes we’re about to see.

Why would I (or my compiler) use this mode?

Because sometimes we have a calculated address already lying around from another operation, and we just want to use it.

The disassembly from the displacement sample above has a good example of this as well:

mov rax, qword ptr [rax]

Base + Index

This is just like addressing via the base register, except that we also add in the value of the index register.

For example:

; store the qword in rcx into the memory address computed
; as the sum of the values in rax and rbx
mov [rax + rbx], rcx

Why would I (or my compiler) use this mode?

I had a hard time contriving an example for this, which of course means that my coworkers immediately found one:

int foo(char * buf, int index) {
  return buf[index];
}

…which yields:

push    rbp
mov     rbp, rsp
mov     qword ptr [rbp - 8], rdi
mov     dword ptr [rbp - 12], esi
mov     rax, qword ptr [rbp - 8]  ; rax is buf
movsxd  rcx, dword ptr [rbp - 12] ; rcx is index
movsx   eax, byte ptr [rax + rcx] ; store buf[index] into eax
pop     rbp
ret

(View it on Godbolt .)

This is obvious in retrospect: Base + Index is perfect for modeling array accesses where neither the array’s starting address nor the offset into the array is fixed at compile-time.

Base + Displacement

More indirection! In case you haven’t guessed it, calculating the effective address with both the base register and the displacement field corresponds to two operations:

  1. We load the value stored in the base register
  2. Adding the loaded value to the value of the displacement field

Then, we take that sum and use it as our effective address. By way of example:

; add 0xcafe to the value stored in rax
; then, store the qword at the computed address into rbx
mov rbx, [rax + 0xcafe]

Why would I (or my compiler) use this mode?

As we’ve seen with Base + Index , some addressing modes naturally reflect C-like array semantics.

Base + Displacement can be thought of in a similar manner, but for structure semantics: the base register holds the address to the beginning of the structure, and the displacement field holds the fixed offset into that structure.

For example, the following:

struct foo {
    long a;
    long b;
};

long bar(struct foo *foobar) {
    return foobar->b;
}

assembles as:

push    rbp
mov     rbp, rsp
mov     qword ptr [rbp - 8], rdi
mov     rax, qword ptr [rbp - 8] ; rax is foobar
mov     rax, qword ptr [rax + 8] ; rax + 8 is foobar->b; store back into rax
pop     rbp
ret

(View it on Godbolt .)

This also makes sense if you think about the stack construction and layout at the beginning of every function as a custom structure: accesses like [rbp - N] are basically stack->objN .

Base + Index + Displacement

If the last mode makes sense to you, then this one is the logical next step: it’s semantically identical, except that we also add the value of the index register.

Just as above, but with one more register:

; add 0xcafe to the values stores in rax and rcx
; then, store the qword at the computer address into rbx
mov rbx, [rax + rcx + 0xcafe]

Why would I (or my compiler) use this mode?

Just as Base + Index naturally models an array access and Base + Displacement naturally models structure access, Base + Index + Displacement naturally models structure access within an array!

I had a hard time getting clang to emit one of these on Godbolt, but eventually got one with -O1 :

struct foo {
    long a;
    long b;
};

long square(struct foo foos[], long i) {
    struct foo x = foos[i];
    return x.b;
}

assembles to the very terse:

shl     rsi, 4
mov     rax, qword ptr [rdi + rsi + 8] ; rdi is foos, rsi is i, 8 is the field offset
ret

(View it on Godbolt .)

Base + (Index * Scale)

Our first multiplication!

The scale field is like displacement in that it’s a constant factor that’s encoded into our instruction. Unlike displacement , however, scale is extremely constrained: it’s only two bits wide, meaning that it can only be 1 of 4 possible values: 1 , 2 , 4 , or 8 .

As the name implies, the scale field is used to scale (i.e., multiply) another field. In particular, it always scales the index register — scale cannot be used without index .

Why would I (or my compiler) use this mode?

Among many other things, Base + (Index * Scale) naturally models accesses into an array of pointers (distinct from an array of laid-out structures, like above!):

struct foo {
    long a;
    long b;
};

long bar(struct foo *foos[], long i) {
    struct foo *x = foos[i];
    return x->b;
}

assembles to:

mov     rax, qword ptr [rdi + 8*rsi] ; rdi is foos, rsi is i, 8 is the scale (pointer-sized!)
mov     rax, qword ptr [rax + 8]
ret

(View it on Godbolt .)

(Index * Scale) + Displacement

Let’s keep going. This is almost identical to the last mode, except that we’ve swapped the base register out for the displacement field. No particular complexity there.

Why would I (or my compiler) use this mode?

(Index * Scale) + Displacement naturally models a specialized case of array access: when the array is statically addressable (e.g., a global) and the element size is computable via the scale .

For example:

int tbl[10];

int foo(int i) {
    return tbl[i];
}

assembles to:

movsxd  rax, edi
mov     eax, dword ptr [4*rax + tbl] ; rax is i, 4 is the scale (sizeof(int) == 4)
ret

(View it on Godbolt .)

Base + (Index * Scale) + Displacement

Now we’re cooking with gas. This is the final and most complex x86_64 addressing form, but there’s absolutely nothing conceptually special about it: it’s just one more arithmetic operation on top of the three-parameter addressing modes.

Why would I (or my compiler) use this mode?

Base + (Index * Scale) + Displacement naturally models a two-dimensional array access:

long tbl[10][10];

long foo(long i, long j) {
    return tbl[i][j];
}

assembles to:

lea     rax, [rdi + 4*rdi]
shl     rax, 4
mov     rax, qword ptr [rax + 8*rsi + tbl]
ret

(View it on Godbolt .)

RIP-relative addressing

The addressing mode documented above is almost identical to its historical x86_32 equivalent — its biggest changes are allowing 64-bit GPRs and (sometimes) 64-bit displacements.

Where x86_64 really diverges is in its addition of a brand new addressing mode, best known as “RIP-relative” addressing.

Why is it called “RIP-relative”? Because it encodes a displacement relative to the RIP register’s value (specifically the RIP of the next instruction, not the current one). This is usually represented with the familiar [Base + Displacement] syntax, except that the base register is now rip instead of a GPR:

mov rax, [rip + 16]

Why would I (or my compiler) use this mode?

For reasons that I originally said that I wouldn’t go into in this blog post: position-independent code and code models.

We’ll make a brief exception: using RIP-relative addressing makes position-independent code smaller and simpler, and is a natural fit for the “small” (and default) code model, where all code and data needs to be addressable within a 32-bit offset.

For example, the following when compiled with -O1 and -fpic :

long tbl[10];

int foo(int i) {
    return tbl[i];
}

requires just two mov s on x86_64:

foo:
        mov     rax, qword ptr [rip + tbl@GOTPCREL]
        mov     rax, qword ptr [rax + 8*rdi]
        ret

…but three and some additional boilerplate on x86_32:

foo:
        call    .L0$pb
.L0$pb:
        pop     eax
.Ltmp0:
        add     eax, offset _GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb)
        mov     ecx, dword ptr [esp + 4]
        mov     eax, dword ptr [eax + tbl@GOT]
        mov     eax, dword ptr [eax + 4*ecx]
        ret

One last catch: segmentation

x86_64 almost killed segmentation. Almost. Segment registers are no longer necessary thanks to the flat address space, but they still show up in a few places:

  • Linux (really glibc) uses fs in userspace to access the TLS segments configured by the kernel. You can find these segments specified in the per-CPU GDT configuration . gs appears free for use in userspace , assuming something else in glibc (or whatever libc you use) doesn’t use it.

  • Linux uses gs in kernelspace to store the base address for the per-CPU variable region. We can see this in the macro definition of PER_CPU_VAR :

    #define PER_CPU_VAR(var)  %__percpu_seg:var
    

    which, on x86_64, expands to:

    %gs:var
    

So, unfortunately, we still need to care about these. The good news is that caring about them isn’t too bad: they essentially boil down to adding the value in the segment register to the rest of the address calculation.

By way of example with a thread-local variable:

int __thread x = 0;

int foo(void) {
    int *y = &x;
    return *y;
}

assembles to:

push    rbp
mov     rbp, rsp
mov     rax, qword ptr fs:[0]    ; grab the base address of the thread-local storage area
lea     rax, [rax + x@TPOFF]     ; calculate the effective address of x within the TLS
mov     qword ptr [rbp - 8], rax ; store the address of x into y
mov     rax, qword ptr [rbp - 8]
mov     eax, dword ptr [rax]
pop     rbp
ret

(View it on Godbolt .)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK