Friday Q&A 2016-01-29: Swift Struct Storage
source link: https://www.mikeash.com/pyblog/friday-qa-2016-01-29-swift-struct-storage.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Swift's class
es tend to be straightforward for most people new to the language to understand. They work pretty much like classes in any other language. Whether you've come from Objective-C or Java or Ruby, you've worked with something similar. Swift's struct
s are another matter. They look sort of like class
es, but they're value types, and they don't do inheritance, and there's this copy-on-write thing I keep hearing about? Where do they live, anyway, and how do they work? Today, I'm going to take a close look at just how struct
s get stored and manipulated in memory.
Simple struct
s
To explore how struct
s get stored in memory, I built a test program consisting of two files. I compiled the test program with optimizations enabled but without whole-module optimization. By building tests that make calls from one file to the other, I was able to prevent the compiler from inlining everything, providing a clearer picture of where everything gets stored and how the data is passed between functions.
To start out, I created a simple struct
with three elements:
struct ExampleInts { var x: Int var y: Int var z: Int }
I created three functions that take an instance of this struct
and return one of the fields:
func getX(parameter: ExampleInts) -> Int { return parameter.x } func getY(parameter: ExampleInts) -> Int { return parameter.y } func getZ(parameter: ExampleInts) -> Int { return parameter.z }
In the other file, I created an instance of the struct
and called each get function:
func testGets() { let s = ExampleInts(x: 1, y: 2, z: 3) getX(s) getY(s) getZ(s) }
The compiler generates this code for getX
:
pushq %rbp movq %rsp, %rbp movq %rdi, %rax popq %rbp retq
Consulting our cheat sheet, we recall that arguments are passed sequentially in registers rdi
, rsi
, rdx
, rcx
, r8
, and r9
, and return values are placed in rax
. The first two instructions here are just the function prologue, and the last two are the epilogue. The real work being done here is the movq %rdi, %rax
, which takes the first parameter and returns it. Let's look at getY
:
pushq %rbp movq %rsp, %rbp movq %rsi, %rax popq %rbp retq
This is almost the same, but it returns the second parameter. How about getZ
?
pushq %rbp movq %rsp, %rbp movq %rdx, %rax popq %rbp retq
Again, almost the same, but it returns the third parameter. From this we can see that the individual struct
elements are treated as separate parameters and passed to the functions individually. Picking out an element on the receiving end is a simple matter of picking the right register.
Let's confirm this on the calling end. Here's the generated code for testGets
:
pushq %rbp movq %rsp, %rbp movl $1, %edi movl $2, %esi movl $3, %edx callq __TF4main4getXFVS_11ExampleIntsSi movl $1, %edi movl $2, %esi movl $3, %edx callq __TF4main4getYFVS_11ExampleIntsSi movl $1, %edi movl $2, %esi movl $3, %edx popq %rbp jmp __TF4main4getZFVS_11ExampleIntsSi
We can see it constructing the struct
instance directly in the parameter registers. (The edi
, esi
, and edx
registers refer to the lower 32 bits of the rdi
, rsi
, and rdx
registers, respectively.) It doesn't even bother trying to save the values across the calls, but just rebuilds the struct
instance each time. Since the compiler knows exactly what the contents are, it can deviate significantly from how the Swift code is written. Note how the call to getZ
is generated a bit differently from the other two. Since it's the last thing in the function, the compiler generates it as a tail call, cleaning up the local call frame and setting up getZ
to return directly to the function that called testGets
.
Let's see what sort of code the compiler generates when it doesn't know the struct
contents. Here's a variant on this test function which gets the struct
instance from elsewhere:
func testGets2() { let s = getExampleInts() getX(s) getY(s) getZ(s) }
getExampleInts
just creates the struct
instance and returns it, but it's in the other file so the compiler can't see what's going on when optimizing testGets2
. Here's that function:
func getExampleInts() -> ExampleInts { return ExampleInts(x: 1, y: 2, z: 3) }
What sort of code does testGets2
generate, now that the compiler can't know what the struct
contains? Here it is:
pushq %rbp movq %rsp, %rbp pushq %r15 pushq %r14 pushq %rbx pushq %rax callq __TF4main14getExampleIntsFT_VS_11ExampleInts movq %rax, %rbx movq %rdx, %r14 movq %rcx, %r15 movq %rbx, %rdi movq %r14, %rsi movq %r15, %rdx callq __TF4main4getXFVS_11ExampleIntsSi movq %rbx, %rdi movq %r14, %rsi movq %r15, %rdx callq __TF4main4getYFVS_11ExampleIntsSi movq %rbx, %rdi movq %r14, %rsi movq %r15, %rdx addq $8, %rsp popq %rbx popq %r14 popq %r15 popq %rbp jmp __TF4main4getZFVS_11ExampleIntsSi
Since the compiler can't reconstitute the values at each step, it has to save them. It places the three struct
elements into the registers rbx
, r14
, and r15
, then loads the parameter registers from those registers at each call. Those registers are saved by the caller, which means that their values are preserved across the call. And again, the compiler generates a tail call for getZ
, with some more extensive cleanup beforehand.
At the top of the function, it calls getExampleInts
and then loads values from rax
, rdx
, and rcx
. Apparently the struct
values are returned in those registers. Let's look at getExampleInts
to confirm:
pushq %rbp movl $1, %edi movl $2, %esi movl $3, %edx popq %rbp jmp __TFV4main11ExampleIntsCfMS0_FT1xSi1ySi1zSi_S0_
This places the values 1
, 2
, and 3
into the argument registers, then calls the struct
's constructor. Here's the generated code for that constructor:
pushq %rbp movq %rsp, %rbp movq %rdx, %rcx movq %rdi, %rax movq %rsi, %rdx popq %rbp retq
Sure enough, it returns the three values in rax
, rdx
, and rcx
. The cheat sheet says nothing about returning multiple values in multiple registers. How about the official PDF? It does say that two values can be returned in rax
and rdx
, but there's no mention of returning a third value in rcx
. That's clearly what's happening, though. That's the fun of a new language: it doesn't always have to play by the old rules. If it was interoperating with C code it would have to follow the standard conventions, but Swift-to-Swift calls can invent new ones.
How about inout
parameters? If they work like we'd do it in C, we'd expect the struct
to be laid out in memory and a pointer passed in. Here are two test functions (in two different files, of course):
func testInout() { var s = getExampleInts() totalInout(&s) } func totalInout(inout parameter: ExampleInts) -> Int { return parameter.x + parameter.y + parameter.z }
Here's the generated code for testInout
:
pushq %rbp movq %rsp, %rbp subq $32, %rsp callq __TF4main14getExampleIntsFT_VS_11ExampleInts movq %rax, -24(%rbp) movq %rdx, -16(%rbp) movq %rcx, -8(%rbp) leaq -24(%rbp), %rdi callq __TF4main10totalInoutFRVS_11ExampleIntsSi addq $32, %rsp popq %rbp retq
In the prologue, it creates a 32-byte stack frame. It then calls getExampleInts
, and after the call saves the resulting values into stack slots at offsets -24
, -16
, and -8
. It then calculates a pointer to offset -24
, loads that into the rdi
parameter register, and calls totalInout
. Here's the generated code for that function:
pushq %rbp movq %rsp, %rbp movq (%rdi), %rax addq 8(%rdi), %rax jo LBB4_3 addq 16(%rdi), %rax jo LBB4_3 popq %rbp retq LBB4_3: ud2
This loads the values by offset from the parameter that's passed in, totaling them up and returning the result in rax
. The jo
instructions are checking for overflow. If either of the addq
instructions produce an oveflow, the jo
instructions will jump down to the ud2
instruction which terminates the program.
We can see that it's exactly as we expected: when passing the struct
to an inout
parameter, the struct
is laid out contiguously in memory and then a pointer to it is passed in.
Big struct
s
What happens if we're dealing with a larger struct, bigger than fits comfortably in registers? Here's a test struct
with ten elements:
struct TenInts { var elements = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) }
Here's a get function that constructs an instance and returns it. This is placed in a separate file to avoid inlining:
func getHuge() -> TenInts { return TenInts() }
Here's a function that gets an element out of this struct
:
func getHugeElement(parameter: TenInts) -> Int { return parameter.elements.5 }
Finally, a test function that exercises these:
func testHuge() { let s = getHuge() getHugeElement(s) }
Let's look at the generated code, starting with testHuge
:
pushq %rbp movq %rsp, %rbp subq $160, %rsp leaq -80(%rbp), %rdi callq __TF4main7getHugeFT_VS_7TenInts movups -80(%rbp), %xmm0 movups -64(%rbp), %xmm1 movups -48(%rbp), %xmm2 movups -32(%rbp), %xmm3 movups -16(%rbp), %xmm4 movups %xmm0, -160(%rbp) movups %xmm1, -144(%rbp) movups %xmm2, -128(%rbp) movups %xmm3, -112(%rbp) movups %xmm4, -96(%rbp) leaq -160(%rbp), %rdi callq __TF4main14getHugeElementFVS_7TenIntsSi addq $160, %rsp popq %rbp retq
This code (excluding the function prologue and epilogue) can be broken into three pieces.
The first piece calculates the address for offset -80
relative to the stack frame, and calls getHuge
, passing that address as a parameter. The getHuge
function has no parameters in the source code, but it's common to use a hidden parameter to return larger struct
s. The caller allocates storage for the return value, then passes a pointer to that storage in the hidden parameter. That appears to be what's going on here, with that allocated storage residing on the stack.
The second piece copies the returned struct
from stack offset -80
to stack offset -160
. It loads pieces of the struct
sixteen bytes at a time into five xmm
registers, then places the contents of those registers back on the stack starting at offset -160
. I'm not clear why the compiler generates this copy rather than using the original value in place. I suspect the optimizer just isn't quite smart enough to realize that it doesn't need the copy.
The third piece calculates the address for stack offset -160
and then calls getHugeElement
passing that address as a parameter. In our previous experiment with a three-element struct
, it was passed by value in registers. With this larger struct
, it's passed by pointer instead.
The generated code for the other functions confirms this: the struct
is passed in and out by pointer, and lives on the stack. Here's getHugeElement
to start with:
pushq %rbp movq %rsp, %rbp movq 40(%rdi), %rax popq %rbp retq
This loads offset 40
from the parameter passed in. Each element is eight bytes, so offset 40
corresponds to elements.5
. The function then returns this value.
Here's getHuge
:
pushq %rbp movq %rsp, %rbp pushq %rbx subq $88, %rsp movq %rdi, %rbx leaq -88(%rbp), %rdi callq __TFV4main7TenIntsCfMS0_FT_S0_ movups -88(%rbp), %xmm0 movups -72(%rbp), %xmm1 movups -56(%rbp), %xmm2 movups -40(%rbp), %xmm3 movups -24(%rbp), %xmm4 movups %xmm0, (%rbx) movups %xmm1, 16(%rbx) movups %xmm2, 32(%rbx) movups %xmm3, 48(%rbx) movups %xmm4, 64(%rbx) movq %rbx, %rax addq $88, %rsp popq %rbx popq %rbp retq
This looks a lot like testHuge
above: it allocates stack space, calls a function, in this case, the TenInts
constructor function, then copies the return value to its final location. Here, that final location is the pointer passed in as the implicit parameter.
While we're here, let's take a look at the TenInts
constructor:
pushq %rbp movq %rsp, %rbp movq $1, (%rdi) movq $2, 8(%rdi) movq $3, 16(%rdi) movq $4, 24(%rdi) movq $5, 32(%rdi) movq $6, 40(%rdi) movq $7, 48(%rdi) movq $8, 56(%rdi) movq $9, 64(%rdi) movq $10, 72(%rdi) movq %rdi, %rax popq %rbp retq
Like the other functions, this takes a pointer to memory for the new struct
as an implicit parameter. It then stores the values 1
through 10
into that memory and returns.
I came across an interesting case while building out these test cases. Here's a test function which makes three calls to getHugeElement
intsead of just one:
func testThreeHuge() { let s = getHuge() getHugeElement(s) getHugeElement(s) getHugeElement(s) }
Here's the generated code:
pushq %rbp movq %rsp, %rbp pushq %r15 pushq %r14 pushq %r13 pushq %r12 pushq %rbx subq $392, %rsp leaq -120(%rbp), %rdi callq __TF4main7getHugeFT_VS_7TenInts movq -120(%rbp), %rbx movq %rbx, -376(%rbp) movq -112(%rbp), %r8 movq %r8, -384(%rbp) movq -104(%rbp), %r9 movq %r9, -392(%rbp) movq -96(%rbp), %r10 movq %r10, -400(%rbp) movq -88(%rbp), %r11 movq %r11, -368(%rbp) movq -80(%rbp), %rax movq -72(%rbp), %rcx movq %rcx, -408(%rbp) movq -64(%rbp), %rdx movq %rdx, -416(%rbp) movq -56(%rbp), %rsi movq %rsi, -424(%rbp) movq -48(%rbp), %rdi movq %rdi, -432(%rbp) movq %rbx, -200(%rbp) movq %rbx, %r14 movq %r8, -192(%rbp) movq %r8, %r15 movq %r9, -184(%rbp) movq %r9, %r12 movq %r10, -176(%rbp) movq %r10, %r13 movq %r11, -168(%rbp) movq %rax, -160(%rbp) movq %rax, %rbx movq %rcx, -152(%rbp) movq %rdx, -144(%rbp) movq %rsi, -136(%rbp) movq %rdi, -128(%rbp) leaq -200(%rbp), %rdi callq __TF4main14getHugeElementFVS_7TenIntsSi movq %r14, -280(%rbp) movq %r15, -272(%rbp) movq %r12, -264(%rbp) movq %r13, -256(%rbp) movq -368(%rbp), %rax movq %rax, -248(%rbp) movq %rbx, -240(%rbp) movq -408(%rbp), %r14 movq %r14, -232(%rbp) movq -416(%rbp), %r15 movq %r15, -224(%rbp) movq -424(%rbp), %r12 movq %r12, -216(%rbp) movq -432(%rbp), %r13 movq %r13, -208(%rbp) leaq -280(%rbp), %rdi callq __TF4main14getHugeElementFVS_7TenIntsSi movq -376(%rbp), %rax movq %rax, -360(%rbp) movq -384(%rbp), %rax movq %rax, -352(%rbp) movq -392(%rbp), %rax movq %rax, -344(%rbp) movq -400(%rbp), %rax movq %rax, -336(%rbp) movq -368(%rbp), %rax movq %rax, -328(%rbp) movq %rbx, -320(%rbp) movq %r14, -312(%rbp) movq %r15, -304(%rbp) movq %r12, -296(%rbp) movq %r13, -288(%rbp) leaq -360(%rbp), %rdi callq __TF4main14getHugeElementFVS_7TenIntsSi addq $392, %rsp popq %rbx popq %r12 popq %r13 popq %r14 popq %r15 popq %rbp retq
The structure of this function is similar to the previous version. It calls getHuge
, copies the result, then calls getHugeElement
three times. For each call, it copies the struct
again, presumably to guard against getHugeElement
making modifications. What I found really interesting is that the copies are all done one element at a time using integer registers, rather than two elements at a time in xmm
registers as testHuge
did. I'm not sure what causes the compiler to choose the integer registers here, as it seems like copying two elements at a time with the xmm
registers would be more efficient and result in smaller code.
I also experimented with really large structs
:
struct HundredInts { var elements = (TenInts(), TenInts(), TenInts(), TenInts(), TenInts(), TenInts(), TenInts(), TenInts(), TenInts(), TenInts()) } struct ThousandInts { var elements = (HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts(), HundredInts()) } func getThousandInts() -> ThousandInts { return ThousandInts() }
The generated code for getThousandInts
is pretty crazy:
pushq %rbp pushq %rbx subq $8008, %rsp movq %rdi, %rbx leaq -8008(%rbp), %rdi callq __TFV4main12ThousandIntsCfMS0_FT_S0_ movq -8008(%rbp), %rax movq %rax, (%rbx) movq -8000(%rbp), %rax movq %rax, 8(%rbx) movq -7992(%rbp), %rax movq %rax, 16(%rbx) movq -7984(%rbp), %rax movq %rax, 24(%rbx) movq -7976(%rbp), %rax movq %rax, 32(%rbx) movq -7968(%rbp), %rax movq %rax, 40(%rbx) movq -7960(%rbp), %rax movq %rax, 48(%rbx) movq -7952(%rbp), %rax movq %rax, 56(%rbx) movq -7944(%rbp), %rax movq %rax, 64(%rbx) movq -7936(%rbp), %rax movq %rax, 72(%rbx) ... movq -104(%rbp), %rax movq %rax, 7904(%rbx) movq -96(%rbp), %rax movq %rax, 7912(%rbx) movq -88(%rbp), %rax movups -80(%rbp), %xmm0 movups -64(%rbp), %xmm1 movups -48(%rbp), %xmm2 movups -32(%rbp), %xmm3 movq %rax, 7920(%rbx) movq -16(%rbp), %rax movups %xmm0, 7928(%rbx) movups %xmm1, 7944(%rbx) movups %xmm2, 7960(%rbx) movups %xmm3, 7976(%rbx) movq %rax, 7992(%rbx) movq %rbx, %rax addq $8008, %rsp popq %rbx popq %rbp retq
The compiler generates two thousand instructions to copy this struct
. This seems like a good place to emit a call to memcpy
, but I imagine that optimizing for absurdly gigantic struct
s isn't a high priority for the compiler team right now.
Class Fields
Let's take a look at what happens when the struct
fields are more complicated than simple integers. Here's a simple class
, and a struct
which contains one:
class ExampleClass {} struct ContainsClass { var x: Int var y: ExampleClass var z: Int }
Here's a set of functions (split across two files to defeat inlining) which exercise them:
func testContainsClass() { let s = ContainsClass(x: 1, y: getExampleClass(), z: 3) getClassX(s) getClassY(s) getClassZ(s) } func getExampleClass() -> ExampleClass { return ExampleClass() } func getClassX(parameter: ContainsClass) -> Int { return parameter.x } func getClassY(parameter: ContainsClass) -> ExampleClass { return parameter.y } func getClassZ(parameter: ContainsClass) -> Int { return parameter.z }
Let's start by looking at the generated code for the getters. Here's getClassX
:
pushq %rbp movq %rsp, %rbp pushq %rbx pushq %rax movq %rdi, %rbx movq %rsi, %rdi callq _swift_release movq %rbx, %rax addq $8, %rsp popq %rbx popq %rbp retq
The three struct
elements will be passed in the first three parameter registers, rdi
, rsi
, and rdx
. This function wants to return the value in rdi
by moving it to rax
and then returning, but it has to do some bookkeeping first. It appears that the object reference passed in rsi
is passed retained, and must be released before the function returns. This code moves rdi
into a safe temporary register, rbx
, then moves the object reference to rdi
and calls swift_release
to release it. It then moves the value in rbx
to the return register rax
and returns from the function.
The code for getClassZ
is pretty much the same, except instead of taking the value from rdi
, it takes it from rdx
:
pushq %rbp movq %rsp, %rbp pushq %rbx pushq %rax movq %rdx, %rbx movq %rsi, %rdi callq _swift_release movq %rbx, %rax addq $8, %rsp popq %rbx popq %rbp retq
The code for getClassY
will be the odd one, since it returns an object reference rather than an integer. Here it is:
pushq %rbp movq %rsp, %rbp movq %rsi, %rax popq %rbp retq
This is short! It moves the value from rsi
, which is the object reference, into rax
and returns it. There's no bookkeeping, just a quick shuffling of data. Apparently, the value is passed in retained, but also returned retained, so this code doesn't have to do any memory management at all.
So far we've seen that the code for dealing with this struct
is much like the code for dealing with the struct
containing three Int
fields, except that the object reference field is passed in retained and must be released by the callee. With that in mind, let's look at the generated code for testContainsClass:
pushq %rbp movq %rsp, %rbp pushq %r14 pushq %rbx callq __TF4main15getExampleClassFT_CS_12ExampleClass movq %rax, %rbx movq %rbx, %rdi callq _swift_retain movq %rax, %r14 movl $1, %edi movl $3, %edx movq %rbx, %rsi callq __TF4main9getClassXFVS_13ContainsClassSi movq %r14, %rdi callq _swift_retain movl $1, %edi movl $3, %edx movq %rbx, %rsi callq __TF4main9getClassYFVS_13ContainsClassCS_12ExampleClass movq %rax, %rdi callq _swift_release movl $1, %edi movl $3, %edx movq %rbx, %rsi popq %rbx popq %r14 popq %rbp jmp __TF4main9getClassZFVS_13ContainsClassSi
The first thing this function does is call getExampleClass
to get the ExampleClass
instance it stores in the struct
. It takes the returned reference and moves it to rbx
for safekeeping.
Next, it calls getClassX
, and to do so it has to build a copy of the struct
in the parameter registers. The two integer fields are easy, but the object field needs to be retained to match what the functions expect. The code calls swift_retain
on the value stored in rbx
, then places that value in rsi
and places 1
and 3
in rdi
and rdx
to build the complete struct. Finally, it calls getClassX
.
The code to call getClassY
is nearly the same. However, getClassY
returns an object reference which needs to be released. After the call, this code moves the return value into rdi
and calls swift_release
to take care of its required memory management.
This function calls getClassZ
as a tail call, so the code here is a bit different. The object reference came retained from getExampleClass
, so it doesn't need to be retained separately for this final call. This code places it into rsi
, places 1
and 3
into rdi
and rdx
again, then cleans up the stack and jumps to getClassZ
to make the final call.
Ultimately, there's little change from a struct
with all Int
s. The only real difference is that copying a struct
with an object in it requires retaining that object, and disposing of that struct
requires releasing the object.
Conclusionstruct
storage in Swift is ultimately pretty straightforward, and much of what we've seen carries over from C's much simpler struct
s. A struct
instance is largely treated as a loose collection of independent values, which can be manipulated collectively when required. Local struct
variables might be stored on the stack or the individual pieces might be stored in registers, depending on the size of the struct
, the register usage of the rest of the code, and the whims of the compiler. Small struct
s are passed and returned in registers, while larger struct
s are passed and returned by reference. struct
s get copied whenever they're passed and returned. Although you can use struct
s to implement copy-on-write data types, the base language construct is copied eagerly and more or less blindly.
That's it for today. Come back next time for more daring feats of programming. Friday Q&A is driven by reader ideas, so if you grow bored while waiting for the next installment and have something you'd like to see discussed, send it in!
Comments:
Jon: All of these examples are with optimization enabled, but not whole-module optimization. WMO will allow inlining and many other fancy things even when parts are in separate files, as long as they're all in the same module (e.g. in the same app target). This is great for real-world code but troublesome for testing codegen because inlining and dead-code elimination make it hard to get the compiler to generate anything useful (or indeed, anything at all) for simplified examples. If you have WMO enabled then some of the cases where struct data has to be loaded into registers or stack memory will be eliminated due to inlining, when the compiler judges it to be a net win. Where the compiler refrains from inlining, then the calls will look just like they do here.
Hope all of these mov will please the processor.
Using as much as possible of different registers will be better optimized by the out-of-order cpu exec.
Swift brags about copy-on-write structs, but it's the Swift stdlib collection types (Array, etc) that implement CoW. It's not a language-level feature.
In fact, the Swift team suggests using it, when possible, by simply stealing these implementations:
"To eliminate the cost of copying large values adopt copy-on-write behavior. The easiest way to implement copy-on-write is to compose existing copy-on-write data structures, such as Array."
Comments RSS feed for this page
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.
Name:Web site:The Answer to the Ultimate Question of Life, the Universe, and Everything?Comment:Formatting: <i> <b> <blockquote> <code>. URLs are automatically hyperlinked.Code syntax highlighting thanks to Pygments.Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK