6

Explain GNU style linker options

 3 years ago
source link: http://maskray.me/blog/2020-11-15-explain-gnu-linker-options
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

(首先庆祝一下LLVM 2000 commits达成!)

Compiler driver options

Before describing the linker options, let's introduce the concept of driver options. The user-facing options of gcc and clang are called driver options. Some driver options affect the options passed to the linker. Many such options have the same name as the linker's, and they often have additional functions in addition to the options of the same name passed to the linker, such as:

  • -shared: Don't set -dynamic-linker; don't link crt1.o
  • -static: Don't set -dynamic-linker; use crtbegint.o instead of crtbegin.o, use --start-group link -lgcc -lgcc_eh -lc (they have (bad) circular dependency)

-Wl,--foo,value,--bar=value will pass the three options --foo, value, and --bar=value to the linker. If there are a lot of link options, you can put each line in a text file response.txt, and then specify -wl,@response.txt.

Note that -O2 will not pass -O2 to the linker, but -Wl,-O2 will.

  • -fno-pic,-fno-PIC are synonymous and generate position-dependent code.
  • -fpie,-fPIE are called small PIE and large PIE respectively. They introduce an optimization on the basis of PIC: the compiled .o can only be used for executable files. See -Bsymbolic below.
  • -fpic,-fPIC are called small PIC and large PIC. They generate position-independent code respectively. There are differences in code generation between the two modes on 32-bit powerpc and sparc (architectures that are about to retire). There are no differences in most architectures.

Input files

The linker accepts several types of input. For symbols, the symbol table of each input file will affect symbol resolution. For sections, only sections (called input sections) in regular object files will contribute to the sections of the output file (called output sections).

  • .o (regular object files)
  • .so (shared objects): only affects symbol resolution
  • .a (archive files)

Archive member selection

The .a file has the special semantics of archive member selection. An archive has an optional index (simplified symbol table), which lists defined symbols and the associated member names.

If the linker finds that an archive member in .a defines a previously undefined symbol, it will pull this member from the archive. The member will conceptually become a regular object file. Its symbol table will be used for symbol resolution and its sections will contribute to output sections.

If the archive cannot satisfy a previously undefined symbol, GNU ld and gold will drop the archive entirely. See --warn-backrefs below for details.

The linker semantics of a thin archive is the same as that of a regular archive.

--start-group can change the semantics of archive member selection. --whole-archive can cancel archive member selection and restore object file semantics.

The linker is in one of the following four modes. The mode controls the output type (executable file/shared object/relocatable object).

  • -no-pie (default): Generate position-dependent executable (ET_EXEC). This mode has the most loose requirements: the source files can be compiled with -fno-pic, -fpie, -fpic.
  • -pie: Generate position-independent executable (ET_DYN). Source files need to be compiled with -fpie,-fpic
  • -shared: Generate position-independent shared object (ET_DYN). The most restrictive mode: source files need to be compiled with -fpic.
  • -r: Relocatable link. This mode is special. It suppresses various linker synthesized sections and reserves relocations.

-pie and -shared are both position-independent link modes. -pie and -no-pie are both executable file link modes. -pie is very similar to -shared -Bsymbolic, but it produces an executable file after all. The following behavior is close to -no-pie but different from -shared:

  • Allow copy relocation and canonical plt.
  • Allow relax general dynamic/local dynamic tls models and tls descriptors to initial exec/local exec.
  • Will resolve undefined weak symbols to zeroes. LLD does not generate dynamic relocation. Whether GNU ld generates dynamic relocation has very complicated rules and is architecture dependent.

Confusingly, the compiler driver provides several options with the same name: -no-pie, -pie, -shared, -r. GCC 6 introduced the configure-time option --enable-default-pie: such builds enable -fpie and -pie by default. Now, many Linux distributions have enabled this option as the basic security hardening.

-no-pie

A defined symbol is not preemptible. All relocations referencing a non-preemptible symbol can be resolved, including absolute GOT-generating (e.g. R_AARCH64_LD64_GOT_LO12_NC), PC-relative GOT-generating (e.g. R_X86_64_REX_GOTPCRELX), etc. The GOT entry for a non-preemptible symbol is a constant, if the GOT entry cannot be optimized out. The image base is an arch-specific non-zero value by default.

  • Some architectures have different PLT code sequences (i386, ppc32 .glink).
  • R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX can be further optimized
  • ppc64 .branch_lt (long branch addresses) can be optimized

-pie and -shared

A symbolic relocation (absolute relocation & width matches the word size) referencing a non-preemptible non-TLS symbol converts to a relative relocation.

-pie

A defined symbol is not preemptible.

-shared

A default visibility defined symbol is preemptible by default (different from COFF).

Symbol related

--exclude-libs

If a matched archive defines a non-local symbol, don't export the symbol.

--export-dynamic

Shared objects default to export all non-local STV_DEFAULT/STV_PROTECTED defined symbols to the dynamic symbol table. Executable files can use --export-dynamic to emulate the behavior of shared objects.

The following describes the rules (logical AND) that a symbol is exported in the executable file/shared object:

  • non-local STV_DEFAULT/STV_PROTECTED (this means it can be hid by --exclude-libs)
  • logical OR of the following:
    • undefined
    • (--export-dynamic || -shared) && ! (unnamed_addr linkonce_odr GlobalVariable || local_unnamed_addr linkonce_odr constant GlobalVariable)
    • matched by --dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbol
    • defined or referenced by a shared object as STV_DEFAULT
    • STV_PROTECTED definition in a shared object preempted by copy relocation/canonical PLT when --ignore-{data,function}-address-equality} is specified
    • -z ifunc-noplt && has at least one relocation

If the executable file defines a symbol that is referenced by a link-time shared object, the linker exports the symbol so that the undefined symbol in the shared object can be bound to the definition in the executable file at runtime. If the executable file defines a symbol that is also defined by a link-time shared object, the linker exports the symbol to enable symbol interposition at runtime.

-Bsymbolic and --dynamic-list

In an ELF shared object, a defined non-local STV_DEFAULT symbol is preemptible (interposable) by default, that is, the definition may be replaced by the definition in the executable file or another shared object at runtime. The definition in the executable file is guaranteed to be non-preemptible (non-interposable). The program compiled by -fpic may be used by shared objects, and referencing the definition in the same linkage unit will have unnecessary overhead by default: the indirection of GOT or PLT.

The linker provides several mechanisms such as -bsymbolic, -bsymbolic-functions, version script and --dynamic-list to make some symbols non-preemptible, similar to -no-pie, -pie the behavior of.

  • -Bsymbolic: all defined symbols are non-preemptible
  • -Bsymbolic-functions: All defined stt_func (function) symbols are non-preemptible
  • --dynamic-list: contains -bsymbolic, but the symbols matched by the list are still preemptible. --dynamic-list can also be used in -no-pie/-pie, but the meaning is different, indicating that some symbols are exported. I think the design of --dynamic-list to have a double meaning is prone to confusion and misuse

The above options make many symbols non-preemptible. With GNU ld 2.35 and LLD 11, you can specify --export-dynamic-symbol=glob to keep some symbols in their original preemptible state. GNU ld 2.35 additionally provides --export-dynamic-symbol-list.

--discard-none, --discard-locals, and --discard-all

If .symtab is produced, a local symbol defined in a live section is preserved if:

if ((--emit-reloc or -r) && referenced) || --discard-none
return true
if --discard-all
return false
if --discard-locals
return is not .L
# No --discard-* is specified.
return not (.L in a SHF_MERGE section)

--strip-all

Do not create .strtab or .symtab.

-u symbol

If an archive file defines the symbol specified by -u, then pull the relevant member (convert from archive file to object file, and then the file will be the same as normal .o).

For example: ld -u foo ... a.a. If a.a does not define the symbols referenced by previous object files, a.a will not be pulled. If -u foo is specified, then the archive member with foo defined in a.a will be pulled.

Another usage of -u is to specify a GC root.

--version-script=script

The version script has three purposes:

  • Define versions
  • Specify some patterns so that the matched, defined, unversioned symbols have the specified version
  • local version: local: can change the matching and definition

The binding of the unversioned symbol is STB_LOCAL and will not be exported to the dynamic symbol table

Symbol versioning describes symbol versioning in details.

-y symbol

Often used for debugging. Output where the specified symbol is referenced and defined.

-z muldefs

Alias: --allow-multiple-definition

A symbol is allowed to be defined in multiple files. By default, the linker does not allow two non-local regular definitions (non-weak, non-common) with the same name.

Library related

--as-needed and --no-as-needed

Normally each link-time shared object has a DT_NEEDED tag. Such a shared object will be loaded by the dynamic loader. This feature can avoid unneeded DT_NEEDED tags.

--as-needed and --no-as-needed are position-dependent options (informally called, but no more appropriate adjectives). In LLD, a shared object is needed, if one of the following conditions is true:

  • it is linked at least once in --no-as-needed mode (i.e. --as-needed a.so --no-as-needed a.so => needed)
  • or it has a definition resolving a non-weak reference from a live section (not discarded by --gc-sections)

In gold, the rule is probably:

  • it is linked at least once in --no-as-needed mode (i.e. --as-needed a.so --no-as-needed a.so => needed)
  • or it has a definition resolving a non-weak reference

In GNU ld,

  • it is linked at least once in --no-as-needed mode (i.e. --as-needed a.so --no-as-needed a.so => needed)
  • or it has a definition resolving a non-weak reference by a previous input file (it works similar to archive selection)

-Bdynamic and -Bstatic

These two options are position-dependent options, which affect -lname that appears on the command line later.

  • -Bdynamic (default): Search for libfoo.so and libfoo.a in the directory list specified by -l
  • -Bstatic: Search libfoo.a in the directory list specified by -l

Historically -Bstatic and -static are synonymous in GNU ld. The compiler driver option -static is a different option. In addition to passing -static to ld, it also removes the default --dynamic-linker, which affects the linking of libgcc, libc, etc.

--no-dependent-libraries

LLD specific. Ignore the .deplibs section in object files.

This section contains a list of filenames. The filenames will be add by LLD as additional input files.

-soname=name

Set the DT_SONAME dynamic tag in the dynamic table of the generated shared object.

The linker will record the shared objects at link time, and use a DT_NEEDED record in the dynamic table of the generated executable file/shared object to describe each shared object at link time.

  • If the shared object contains DT_SONAME, this field provides the value of DT_NEEDED
  • Otherwise, if the link is through -l, the value is the base file name
  • Otherwise, the value is the path name (there is a difference between absolute/relative paths)

For example: ld -shared -soname=a.so.1 a.o -o a.so; ld b.o ./a.so, a.out has a DT_NEEDED tag of a.so.1. If the first command does not contain -soname, a.out will have a DT_NEEDED tag of ./a.so.

--start-group and --end-group

If there is a mutual reference between a.a and b.a, and you are not sure which one will be pulled into the link first, you have to use this pair of options. An example is given below:

For an archive linking order: main.o a.a b.a, assuming that main.o refers to b.a, and a.a does not satisfy a previous undefined symbol, then the linking order will cause an error. Can the link order be replaced by main.o b.a a.a? If main.o references a.a after the change, and b.a does not satisfy one of the previous undefined symbols, then the link sequence will also cause an error.

One solution is main.o a.a b.a a.a. In many cases, it is enough to repeat a.a once, but if only a.a(ao) is loaded when linking the first aa, only ba(bo) is loaded when linking bb, and only is loaded when linking the second aa. aa(co) and aa(co) needs another member in ba, this link sequence will still cause undefined symbol error.

We can repeat b.a again, that is main.o a.a b.a a.a b.a, but a better solution is main.o --start-group a.a b.a --end-group.

--start-lib and --end-lib

This is a useful feature invented by gold which can replace thin archives. Regular object files surrounded by --start-lib and --end-lib have archive selection semantics.

If a.a contains b.o c.o, ld ... --start-lib b.o c.o --end-lib works like ld ... a.a.

I submitted a feature request for GNU ld: https://sourceware.org/bugzilla/show_bug.cgi?id=24600

--sysroot

This is different from the --sysroot driver option. If a linker script is in the sysroot directory, when it opens an absolute path file (input or group), add sysroot before the absolute path.

-t --trace

Print relocatable object files, shared objects, and extracted archive members.

--whole-archive and --no-whole-archive

The .a after the --whole-archive option will be treated as .o without lazy semantics. If a.a contains b.o c.o, then ld --whole-archive a.a --no-whole-archive has the same effect as ld b.o c.o.

--push-state and --pop-state

-bstatic, --whole-archive, --as-needed, etc. are all position-dependent options that represent the boolean state. --push-state can save the boolean state of these options, and --pop-state will restore it.

When inserting a new option in the link command line to change the state, you usually want to restore it. At this time, you can use --push-state and --pop-state. For example, to make sure to link libc++.a and libc++abi.a, you can use -wl,--push-state,-bstatic -lc++ -lc++abi -wl,--pop-state.

Dependency related

See Dependency related linker options for details.

-z defs and -z undefs

Whether to report an error for an unresolved undefined symbol from a regular object. "unresolved" means that the symbol is not defined by a regular object file or a link-time shared object. Executable links default to -z defs/--no-undefined (not allowed) and -shared links default to -z undefs (allowed).

Many build systems enable -z defs, requiring shared objects to specify all dependencies when linking (link what you use).

--allow-shlib-undefined and --no-allow-shlib-undefined

Whether to report an error for an unresolved undefined symbol from a shared object. Executable links default to --no-allow-shlib-undefined (not allowed) and -shared links default to --allow-shlib-undefined (allowed).

For the following code, an error will be reported when linking the executable file:

// a.so
void f();
void g() {f();}

// exe
void g()
int main() {g();}

If you enable --allow-shlib-undefined, the link will succeed, but ld.so will report an error at runtime. In glibc, the error is symbol lookup error: ... undefined symbol:.

GNU ld has a complex algorithm to find transitive closures. Only when shared objects of transitive closures cannot resolve an undefined symbol, an error will be reported. gold and lld use a simplified rule: if all DT_NEEDED dependencies of a shared object are directly linked, an error is enabled; if some of the dependencies are not linked, then gold/lld cannot accurately determine whether an indirectly shared object can provide a definition, so they are conservative and do not report errors.

It is worth mentioning that -z defs/-z undefs/--no-undefined and --[no-]allow-shlib-undefined can be controlled by an option --unresolved-symbols.

--warn-backrefs

Specific to lld, see http://lld.llvm.org/elf/warn_backrefs.html.

Layout related

--no-rosegment

By default LLD uses the two RW PT_LOAD design:

  • R PT_LOAD
  • RX PT_LOAD
  • RW PT_LOAD (overlaps with PT_GNU_RELRO)
  • RW PT_LOAD

Specify this option to combine the R PT_LOAD and the RX PT_LOAD.

-z separate-loadable-segments

LLD's traditional layout: all PT_LOAD segments do not overlap (a byte will not be loaded into two memory mappings at the same time).

The implementation is that the address of each new PT_LOAD is aligned to max-page-size. lld presets 4 PT_LOAD(r,rx,rw(relro),rw(non-relro)). Three alignments in the output file may waste some bytes. On aarch64 and powerpc, because the max-page-size specified by abi is larger (65536), up to 65536*3 bytes can be wasted.

-z separate-code

Introduced in binutils 2.31, it is default on Linux/x86. GNU ld has such a layout:

  • R PT_LOAD
  • RX PT_LOAD
  • R PT_LOAD
  • RW PT_LOAD
    • PT_GNU_RELRO part
    • Non-PT_GNU_RELRO part

The idea is that a byte (RX PT_LOAD) in the file that is mapped to the executable section will not be mapped to an R PT_LOAD at the same time. Note that the R after RX is not great. The better layout is to merge this R with the first R, but it seems to be difficult to implement in GNU ld.

I introduced this option in LLD 10. The semantics are similar to gnu ld but the layout is different: wwo RW PT_LOAD are allowed to overlap, which means that the address of the second PT_LOAD does not need to be aligned, and max-page-size*2 bytes can be wasted at most.

-z noseparate-code

This is GNU ld's classic layout allowing the executable section to overlap with other PT_LOAD.

  • RX PT_LOAD
  • RW PT_LOAD
    • The prefix part is PT_GNU_RELRO. This part of mprotect becomes readonly after ld.so processes dynamic relocations.
    • The part that is not PT_GNU_RELRO. This part is always writable at runtime.

The first PT_LOAD is often called text segment, but the term is inaccurate because the segment has read-only data as well.

This layout is used by default in LLD 10, and there is no need to align any PT_LOAD.

Relocation related

--apply-dynamic-relocations

Some psABI use the RELA format (AArch64, PowerPC, RISC-V, x86-64, etc): relocations contain the addend field. On such targets, --apply-dynamic-relocations requires the linker to set the initial value of the relocated location to the addend instead of 0. If the executable file/shared objects uses compression, --no-apply-dynamic-relocations can improve compression.

--emit-relocs

This option makes -no-pie/-pie/-shared links to keep input relocations, in a way similar to -r. It can be used for binary analysis after linking. The only two uses I know are config_relocatable and bolt of linux kernel x86.

--pack-dyn-relocs=value

relr can enable DT_RELR, a more compact relative relocation (R_*_RELATIVE) encoding format. Relative relocations are common in position independent executables.

-z text and -z notext

-z text does not allow text relocations. -z notext allows text relocations.

Starting from binutils 2.35, GNU ld on linux/x86 enables the configure-time option --enable-textrel-warning=warning by default, and a warning will be given if there are text relocations.

The wording of the concept of text relocations is inaccurate. The actual meaning is the general term for dynamic relocations acting on sections without the SHF_WRITE flag. If the value of relocations in .o cannot be determined at link time, it needs to be converted to dynamic relocations and calculated by ld.so at runtime (type and .o are the same). If the active section does not have the SHF_WRITE flag, ld.so will have to temporarily execute mprotect to change the permissions of the memory maps, modify, and restore the previous read-only permissions, which hinders page sharing.

Shared objects form text relocations more than executable files. Executable files have canonical plt and copy relocations to avoid certain text relocations.

Different linkers allow different relocation types of text relocations on different architectures. GNU ld may allow quite a few relocation types supported by glibc ld.so. On x86-64, the linker will allow R_X86_64_64 and R_X86_64_PC64.

In the following assembler, defined_in_so is a symbol defined in a shared object. The scenario of each text relocation is given in the comments.

.globl global
global:
local:
.quad local # (-pie or -shared) R_X86_64_RELATIVE
.quad global # (-pie) R_X86_64_RELATIVE or (-shared) R_X86_64_64
.quad defined_in_so # (-shared) R_X86_64_64
.quad defined_in_so - . # (-shared) R_X86_64_PC64

In -no-pie or -pie mode, the linker will make different choices according to the symbol type of defined_in_so:

  • STT_FUNC: generate canonical plt
  • STT_OBJECT: Generate copy relocation
  • STT_NOTYPE: gnu ld will generate copy relocation. lld will generate text relocation

Section related

--gc-sections

Specify -ffunction-sections or -fdata-sections at compile time to have an effect. The linker will do liveness analysis to remove unused sections from the output.

See Linker garbage collection for details.

-z start-stop-gc and -z nostart-stop-gc

-z start-stop-gc means a __start_foo or __stop_foo reference from a live section does not retain all foo input sections.

-z nostart-stop-gc means a __start_foo or __stop_foo reference from a live section retains all foo input sections.

--icf=all and --icf=safe

Enable identical code folding. The name originated from MSVC link.exe where icf stands for "identical COMDAT folding". gold named it "identical code folding" - which makes sense because gold does not fold readonly data.

This name is not accurate: (1) the feature can apply to readonly data as well; (2) the folding is by section, not by function.

We define identical sections as they have identical content and their outgoing relocation sets cannot be distinguished: they need to have the same number of relocations, with the same relative locations, with the referenced symbols indistinguishable. This is now a recursive definition, because if .text.a and .text.b reference different symbols at the same location, they can still be indistinguishable if the referenced symbols satisfy the identical code/rodata requirement.

For a set of identical sections, the linker picks one representative and drops the rest, then redirect references to the representative.

gold implements --icf=safe based on relocation. LLD implements --icf=safe based on the LLVM address significance table.

--symbol-ordering-file=file

Specify a text file with one defined symbol per line. If the symbol a is before the symbol b, then sort in each input section description, and the section where a is located is sorted before the section where b is.

If a symbol is not defined or the section in which it is located is discarded, the linker will output a warning unless --no-warn-symbol-ordering is specified.

If one function frequently calls another, if the input sections where the two functions are located in the linked image are close, the probability that they will fall on the same page can be increased, the page working set can be reduced, and the tlb thrashing can be reduced. See profile guided code positioning by karl pettis and robert c. hansen

This option is unique to LLD. gold has a --section-ordering-file, sorted by section name. In practice, text/data sections are required to have different names (you cannot use clang -funique-section-names). For sorting based on symbol names, you can use -funique-section-names.

Analysis related

--cref

Output the cross reference table. For each non-local symbol, output the defined file and the list of files with references.

-m and -map=file

Output the link map, you can view the address of the output sections, the file offset, and the included input sections.

Warning related

--fatal-warnings

Turn warnings into errors. The difference between warning and error is that besides whether it contains warning or error string, the more important point is that error prevents the output of the link result.

--noinhibit-exec

Turn some errors into warnings. Be careful not to specify --fatal-warnings to upgrade the degraded warnings to errors again:)

Others

--build-id=value

Generate .note.gnu.build-id to give the output an identifier. The identifier is generated by hashing the whole output.

SHA-1 is the most common choice. The linker will fill in the content of .note.gnu.build-id with zeros, hash each byte and fill the result back to .note.gnu.build-id. Some linkers use tree-style hashes.

--compress-debug-sections=zlib

Use zlib to compress .debug_* sections of the output file and mark SHF_COMPRESSED. SHF_COMPRESSED is the last feature merged into the elf specification, after which the elf specification is not maintained...

--hash-style=style

The ELF specification requires a hash table DT_HASH for dynamic symbol lookup. --hash-style=sysv generates the table.

DT_GNU_HASH is better than DT_HASH in terms of space consumption and performance. mips uses the alternative DT_MIPS_XHASH (a good example of mips abi suffering from its own wisdom). I personally think DT_MIPS_XHASH is solving a wrong problem. In fact, there is a way to use DT_GNU_HASH, but people in the mips community may not want to worry about it another time.

--no-ld-generated-unwind-info

See PR12570 .plt has no associated .eh_frame/.debug_frame.

When the pc is in the plt entry, if the linker does not synthesize .eh_frame information, unwinding from the current PC will not get frames. On i386 and x86-64, in the lazy binding state, the first call of a plt entry will execute the push instruction. After the esp/rsp is changed, if the plt entry does not have the unwind information provided by .eh_frame, the unwinder may not be able to unwind correctly, which affects the accuracy of profilers.

jmp *got(%rip)
pushq $0x0
jmpq .plt

However, I think this feature is obsoleted and irrelevant nowadays. To recognize the PLT name, a profiler needs to do:

  • Parse the .plt section to know the region of PLT entries
  • Parse .rel[a].plt to get R_*_JUMP_SLOT dynamic relocations and their referenced symbol names.
  • If the current PC is within the PLT region, parse nearly instructions and find the GOT load. The associated R_*_JUMP_SLOT identifies the symbol name.
  • Concatenate the symbol name and @plt to form foo@plt

Note: foo@plt is a convention used by some tools, but it is not a name in the symbol table.

gdb has heuristics to identify this situation.

This problem will not affect the c++ exception. The plt entry is a tail call, and the _Unwind_RaiseException called by __cxa_throw will penetrate the tail calls of the ld.so resolver and plt entry. The pc will be restored to the next instruction of the caller of the plt entry.

// b.cc - b.so
void ext() { throw 3; }

// a.cc - exe
#include <stdio.h>

void ext();
void foo() {
try {
ext(); // PLT entry
} catch (int x) {
printf("%d\n", x);
}
}

int main() {
foo();
}

-O

Enable optimizations. The optimization level is different from the compiler driver option -O.

In LLD, -O0 disables constant merge of SHF_MERGE; -O2 enables string suffix merge of SHF_MERGE|SHF_STRINGS, --compress-debug-sections=zlib uses zlib compression with higher compression ratio.

-plugin file

GNU ld and gold support this option to load GCC LTO plugin (liblto_plugin.so) or LLVM LTO plugin (llvmgold.so)

binutils-gdb/include/plugin-api.h defines the plugin API.

Despite the name of LLVMgold.so containing gold, the file can be used by GNU ld, nm and ar.

解析GNU风味的linker options

编译器driver options

在描述链接器选项前先介绍一下driver options。通常使用gccclang,指定的都是driver options。一些driver options会影响传递给链接器的选项。 有些driver options和链接器重名,它们往往在传递给链接器同名选项之外还有额外功效,比如:

  • -shared: 不设置-dynamic-linker,不链接crt1.o
  • -static: 不设置-dynamic-linker,使用crtbeginT.o而非crtbegin.o,使用--start-group链接-lgcc -lgcc_eh -lc(它们有(不好的)循环依赖)

-Wl,--foo,value,--bar=value会传递--foovalue--bar=value三个选项给链接器。 如果有大量链接选项,可以每行一行放在一个文本文件response.txt里,然后指定-Wl,@response.txt

注意,-O2不会传递-O2给链接器,-Wl,-O2则会。

  • -fno-pic,-fno-PIC是同义的,生成position-dependent code
  • -fpie,-fPIE分别叫做small PIE、large PIE,在PIC基础上引入了一个优化:编译的.o只能用于可执行档。参见下文的-Bsymbolic
  • -fpic,-fPIC分别叫做small PIC、large PIC,position-independent code。在32-bit PowerPC和Sparc上(即将退出历史舞台的架构)两种模式有代码生成差异。大多数架构没有差异。

链接器接受几类输入。对于符号,每个输入文件的符号表都会影响符号解析;对于sections,只有regular object files里的sections(称为input sections)会拼接得到输出文件的output sections。

  • .o (regular object files)
  • .so (shared objects): 只影响符号解析
  • .a (archive files)

Archive member selection

.a文件具有archive member selection的特殊语义。每个成员都是惰性的。 如果链接器发现.a中的某个archive member定义了某个之前被引用但尚未定义的符号,则会从archive中pull这个member。 该member会在概念上成为一个regular object file,其符号表被用于符号解析,且贡献input sections,之后的处理方式就和.o没有任何差异了。

若该archive不能满足之前的某个undefined符号,GNU ld和gold会跳过该archive,详见--warn-backrefs

Thin archive的链接语义和regular archive相同。

--start-group可以改变archive member selection语义。 --whole-archive可以取消archive member selection,还原object file语义。

以下四种链接模式四选一,控制输出文件的类型(可执行档/shared object/relocatable object):

  • -no-pie (default): 生成position-dependent executable (ET_EXEC)。要求最宽松,源文件可用-fno-pic,-fpie,-fpic编译
  • -pie: 生成position-independent executable (ET_DYN)。源文件须要用-fpie,-fpic编译
  • -shared: 生成position-independent shared object (ET_DYN)。最严格,源文件须要用-fpic编译
  • -r: relocatable link,不生成linker synthesized sections,且保留relocations

-pie可以和-shared都是position-independent的链接模式。-pie也可以和-no-pie都是可执行档的链接模式。 -pie-shared -Bsymbolic很相似,但它毕竟是可执行档,以下行为和-no-pie贴近而与-shared不同:

  • 允许copy relocation和canonical PLT
  • 允许relax General Dynamic/Local Dynamic TLS models和TLS descriptors到Initial Exec/Local Exec
  • 会链接时解析undefined weak,(LLD行为)不生成dynamic relocation。GNU ld是否生成dynamic relocation有非常复杂的规则,且和架构相关

容易产生混淆的是,编译器driver提供了几个同名选项:-no-pie,-pie,-shared,-r。 GCC 6引入了configure-time选项--enable-default-pie:启用该选项的GCC预设-pie-fPIE。现在,很多Linux发行版都启用了该选项作为基础的security hardening。

--exclude-libs

If a matched archive defines a non-local symbol, don't export this symbol.

--export-dynamic

Shared objects预设导出所有non-local STV_DEFAULT/STV_PROTECTED定义符号到dynamic symbol table。可执行档可用--export-dynamic模拟shared objects行为。

下面描述可执行档/shared object里一个符号被导出的规则(logical AND):

  • non-local STV_DEFAULT/STV_PROTECTED (this means it can be hid by --exclude-libs)
  • logical OR of the following:
    • undefined
    • (--export-dynamic || -shared) && ! (unnamed_addr linkonce_odr GlobalVariable || local_unnamed_addr linkonce_odr constant GlobalVariable)
    • matched by --dynamic-list/--export-dynamic-symbol-list/--export-dynamic-symbol
    • defined or referenced by a shared object as STV_DEFAULT
    • STV_PROTECTED definition in a shared object preempted by copy relocation/canonical PLT when --ignore-{data,function}-address-equality} is specified
    • -z ifunc-noplt && has at least one relocation

如果可执行档定义了在某个链接时shared object引用了一个符号,那么链接器需要导出该符号,使得运行时该shared object的undefined符号可以绑定到可执行档中的定义。

-Bsymbolic and --dynamic-list

ELF中,non-local STV_DEFAULT的定义的符号在一个shared object预设会被preempt(interpose),即运行时该定义可能被可执行档或另一个shared object中的定义替换。 可执行档中的定义是保证non-preemptible (non-interposable)的。 -fPIC编译的程序被认为可能用于shared object,引用模块(一个可执行档或一个shared object被称为一个模块)内的定义预设会有不必要的开销:GOT或PLT的间接引用开销。

链接器提供了-Bsymbolic-Bsymbolic-functions、version script和--dynamic-list等几种机制使部分符号non-preemptible,获得和与-no-pie,-pie相似的行为。

  • -Bsymbolic: 所有定义的符号non-preemptible
  • -Bsymbolic-functions: 所有定义的STT_FUNC(函数)符号non-preemptible
  • --dynamic-list: 蕴含-Bsymbolic,但被列表匹配的符号仍为preemptible。--dynamic-list也可用于-no-pie/-pie,但含义不同,表示导出部分符号。我认为--dynamic-list设计成双重含义容易产生困惑和误用

上述选项会使很多符号non-preemptible。GNU ld 2.35和LLD 11可以用--export-dynamic-symbol=glob使部分符号保持原来的preemptible状态。GNU ld 2.35另外提供--export-dynamic-symbol-list

--discard-none, --discard-locals, and --discard-all

如果输出.symtab,一个live section里定义的local符号被保留的条件是:

if ((--emit-reloc or -r) && referenced) || --discard-none
return true
if --discard-all
return false
if --discard-locals
return is not .L
# No --discard-* is specified.
return not (.L in a SHF_MERGE section)

--strip-all

不要创建.strtab.symtab

-u symbol

若某个archive file定义了-u指定的符号则pull(由archive file转换为object file,之后该文件就和一般的.o相同)。

比如:ld -u foo ... a.a。若a.a不定义被之前object files引用的符号,a.a不会被pull。 如果指定了-u foo,那么a.a中定义了foo的archive member会被pull。

-u的另一个作用是指定一个GC root。

--version-script=script

Version script有三个用途:

  • 定义versions
  • 指定一些模式,使得匹配的、定义的、unversioned的符号具有指定的version
  • Local version:local:可以改变匹配的、定义的、unversioned的符号的binding为STB_LOCAL,不会导出到dynamic symbol table

Symbol versioning描述了具体的symbol versioning机制。

-y symbol

常用于调试。输出指定符号在哪里被引用、哪里被定义。

-z muldefs

允许重复定义的符号。链接器预设不允许两个同名的non-local regular definitions(非weak、非common)。

Library相关

--as-needed and --no-as-needed

防止一些没有用到的链接时shared objects留下DT_NEEDED

--as-needed--no-as-needed是position-dependent选项(非正式叫法,但没找到更贴切的形容词),影响后面命令行出现的shared objects。一个shared object is needed,如果下面条件之一成立:

  • 在命令行中至少一次出现在--no-as-needed模式下
  • 定义了一个被.o live section non-weak引用的符号。也就是说,weak定义仍可能被认为是unneeded。--gc-sections丢弃的section的引用不算

-Bdynamic and -Bstatic

这两个选项是position-dependent选项,影响后面命令行出现的-lname

  • -Bdynamic (default):在-L指定的目录列表中查找libfoo.solibfoo.a
  • -Bstatic:在-L指定的目录列表中查找libfoo.a

注意,历史上GNU ld里-Bstatic-static同义。编译器driver的-static是个不同的选项,除了传递-static给ld外,还会去除预设的--dynamic-linker,影响libgcc libc等的链接。

--no-dependent-libraries

忽略object files里的.deplibs section。

-soname=name

设置生成的shared object的dynamic table中的DT_SONAME

链接器会记录链接时shared objects,在生成的可执行档/shared object的dynamic table中用一条DT_NEEDED记录描述每一个链接时shared object。

  • 若该shared object含有DT_SONAME,该字段提供`DT_NEEDED的值
  • 否则,若通过-l链接,值为去除目录后的文件名
  • 否则值为路径名(绝对/相对路径有差异)

比如:ld -shared -soname=a.so.1 a.o -o a.so; ld b.o ./a.soa.outDT_NEEDEDa.so.1。如果第一个命令不含-soname,则a.outDT_NEEDED./a.so

--start-group and --end-group

如果A.aB.a有相互引用,且不能确定哪一个会被先pull into the link,得使用这对选项。下面给出一个例子:

对于一个archive链接顺序:main.o A.a B.a,假设main.o引用了B.a,而A.a没有满足之前的某个undefined符号,那么该链接顺序会导致错误。 链接顺序换成main.o B.a A.a行不行呢?如果main.o变更后引用了A.a,而B.a没有满足之前的某个undefined符号,那么该链接顺序也会导致错误。

一种解决方案是main.o A.a B.a A.a。很多情况下重复一次就够了,但是假如链接第一个A.a时仅加载了A.a(a.o),链接B.b时仅加载了B.a(b.o),链接第二个A.a时仅加载了A.a(c.o)A.a(c.o)需要B.a中的另一个member,该链接顺序仍会导致undefined symbol错误。

我们可以再重复一次B.a,即main.o A.a B.a A.a B.a,但更好的解决方案是main.o --start-group A.a B.a --end-group

--start-lib and --end-lib

gold发明的很有用的功能,可以代替thin archive。使regular object files有类似archive files的语义(按需加载)。

下文的--whole-archive用于.a,而--start-lib则用于.o

ld ... --start-lib b.o c.o --end-lib作用类似ld ... a.a,如果a.a包含b.o c.o

我提交了一个GNU ld的feature request:https://sourceware.org/bugzilla/show_bug.cgi?id=24600

--sysroot

和GCC/Clang driver的--sysroot不同。如果一个linker script在sysroot目录下,它打开绝对路径文件(INPUT or GROUP)时,在绝对路径前加上sysroot。

--whole-archive and --no-whole-archive

--whole-archive选项后的.a会当作.o一样处理,没有惰性语义。 如果a.a包含b.o c.o,那么ld --whole-archive a.a --no-whole-archiveld b.o c.o作用相同。

--push-state and --pop-state

-Bstatic, --whole-archive, --as-needed等都是表示boolean状态的position-dependent选项。--push-state可以保存这些选项的boolean状态,--pop-state则会还原。

在链接命令行插入新选项里变更状态时,通常希望能还原,这个时候就可以用--push-state--pop-state。 比如确保链接libc++.alibc++abi.a可以用-Wl,--push-state,-Bstatic -lc++ -lc++abi -Wl,--pop-state

依赖关系相关

-z defs and -z undefs

遇到来自regular objects的不能解析的undefined符号(不能在链接时绑定到可执行档或一个链接时shared object中的定义),是否报错。可执行档预设为-z defs/--no-undefined(不允许),而shared objects预设为-z undefs(允许)。

很多构建系统会启用-z defs,要求shared objects在链接时指定所有依赖(link what you use)。

--allow-shlib-undefined and --no-allow-shlib-undefined

遇到来自shared objects的不能解析的undefined符号,是否报错。可执行档预设为--no-allow-shlib-undefined(不允许),而shared objects预设为--allow-shlib-undefined(允许)。

对于如下代码,链接可执行档时会报错:

// a.so
void f();
void g() { f(); }

// exe
void g()
int main() { g(); }

如果启用--allow-shlib-undefined,链接会成功,但ld.so会在运行时报错,在glibc中为:symbol lookup error: ... undefined symbol:

GNU ld有个复杂的算法查找transitive closure,只有transitive closure的shared objects都无法解析一个undefined符号时才会报错。 gold和LLD使用一个简化的规则:如果一个shared object的所有DT_NEEDED依赖都被直接链接了,则启用报错;如果部分依赖没有被链接,那么gold/LLD无法准确判断是否一个未被直接链接的shared object能提供定义,就保守地不报错。

值得一提的是,-z defs/-z undefs/--no-undefined--[no-]allow-shlib-undefined可以被一个选项--unresolved-symbols控制。

--warn-backrefs

LLD特有,参见http://lld.llvm.org/ELF/warn_backrefs.html

Layout相关

--no-rosegment

LLD采用两个RW PT_LOAD的设计:

  • R PT_LOAD
  • RX PT_LOAD
  • RW PT_LOAD(和PT_GNU_RELRO重叠)
  • RW PT_LOAD

指定该选项可以合并R PT_LOAD和RX PT_LOAD

-z separate-loadable-segments

LLD传统布局:所有PT_LOAD segments都没有重叠(一个字节不会被同时加载到两个memory mappings)。

实现方式是每个新PT_LOAD的地址对齐到max-page-size。LLD预设有4个PT_LOAD(R,RX,RW(RELRO),RW(non-RELRO)),在输出文件里三次对齐都可能浪费一些字节。 在AArch64和PowerPC上因为ABI指定的max-page-size较大(65536),最多可浪费65536*3字节。

-z separate-code

binutils 2.31引入,在Linux/x86上为预设。GNU ld采用:

  • R PT_LOAD
  • RX PT_LOAD
  • R PT_LOAD
  • RW PT_LOAD
    • 前缀部分为PT_GNU_RELRO
    • PT_GNU_RELRO的部分

separate-code的含义是文件中一个被映射到可执行段的字节(RX PT_LOAD)不会被同时映射到一个R PT_LOAD。 注意RX后的R是不忧的,理想情况是把这个R和第一个R合并,但似乎在GNU ld里实现会很困难。

我在LLD 10引入该选项,语义和GNU ld类似但布局不同(没有必要模仿两个R的非优布局):两个RW PT_LOAD允许重叠,也就是说第二个PT_LOAD的地址不用对齐,最多可浪费max-page-size*2字节。

-z noseparate-code

经典布局,允许可执行段和其他PT_LOAD重叠。GNU ld通常用:

  • RX PT_LOAD
  • RW PT_LOAD
    • 前缀部分为PT_GNU_RELRO。这部分在ld.so解析完dynamic relocations后mprotect成readonly
    • PT_GNU_RELRO的部分。这部分在运行时始终可写

第一个PT_LOAD常被笼统的称为text segment,实际上不准确:非执行部分的rodata也在里面。

LLD 10中预设使用这种布局,不需要对齐任何PT_LOAD

Relocation相关

--apply-dynamic-relocations

对于psABI采用RELA的architectures(AArch64,PowerPC,RISC-V,x86-64,etc),因为dynamic relocations包含addend字段,链接器在被relocate的地址填上0,而不是addend值。 如果可执行档/shared objects使用压缩,能稍稍利于压缩。

--emit-relocs

可用于-no-pie/-pie/-shared获得类似-r的效果:保留输入的relocations。可用于链接后的二进制分析,我知道的唯二用途是Linux kernel x86的CONFIG_RELOCATABLE和BOLT。

--pack-dyn-relocs=value

relr可以启用DT_RELR,一种更加紧凑的relative relocation (R_*_RELATIVE)编码方式。Relative relocations常见于-pie链接的可执行档。

-z text and -z notext

-z text不允许text relocations。 -z notext允许text relocations。

binutils 2.35起,Linux/x86上的GNU ld预设启用configure-time选项--enable-textrel-warning=warning,若有text relocations会给出warning。

Text relocations这个概念的用词不准确,实际含义是作用在readonly sections上的dynamic relocations的总称。 .o中的relocations如果不能在链接时确定值,就需要转换成dynamic relocations在运行时由ld.so计算(type和.o中相同)。 如果作用的section没有SHF_WRITE标志,ld.so就得临时执行mprotect变更memory maps的权限、修改、再还原之前的只读权限,这样就妨碍了page sharing。

Shared objects形成text relocations的情况比可执行档多。 可执行档有canonical PLT和copy relocations可以避免某些text relocations。

不同链接器在不同架构上允许的text relocations的relocation types不同。GNU ld会允许一些glibc ld.so支持的types。 在x86-64上,链接器都会允许R_X86_64_64R_X86_64_PC64

下面的汇编程序里defined_in_so是定义在某个shared object的符号。注释里给出每种text relocation的场景。

.globl global
global:
local:
.quad local # (-pie or -shared) R_X86_64_RELATIVE
.quad global # (-pie) R_X86_64_RELATIVE or (-shared) R_X86_64_64
.quad defined_in_so # (-shared) R_X86_64_64
.quad defined_in_so - . # (-shared) R_X86_64_PC64

-no-pie-pie模式下,根据defined_in_so的符号类型,链接器会作出不同选择:

  • STT_FUNC: 产生canonical PLT
  • STT_OBJECT: 产生copy relocation
  • STT_NOTYPE:GNU ld会产生copy relocation。LLD会产生text relocation

Section相关

--gc-sections

非常常见的选项。编译时指定-ffunction-sections-fdata-sections才有效果。链接器会做liveness analysis从输出中去除没有用的sections。

GC roots:

  • --entry/--init/--fini/-u指定的所有定义符号所在的sections
  • Linker script表达式被引用的定义符号所在的sections
  • .dynsym中的所有定义符号所在的sections
  • 类型为SHT_PREINIT_ARRAY/SHT_INIT_ARRAY/SHT_FINI_ARRAY
  • 名称为.ctors/.dtors/.init/.fini/.jcr
  • 不在section group中的SHT_NOTE(这个section group规则是为了Fedora watermark)
  • .eh_frame引用的personality routines和language-specific data area

--icf=all --icf=safe

启用Identical Code Folding。这个名称其实不准确:(1) 适用于readonly data;(2) 合并的单位是section,而不是函数。

对于一组相同的sections,选择一个作为代表,丢弃其余的sections,然后把relocation重定向到代表section。

gold实现了基于relocation的--icf=safe;LLD实现了基于LLVM address significance table的--icf=safe

--symbol-ordering-file=file

指定一个文本文件,每行一个定义的符号。如果符号A在符号B前面,那么在每一个input section description进行排序,A所在的section排在B所在的section前面。

如果一个符号未定义,或者所在的section被丢弃,链接器会输出一个warning,除非指定了--no-warn-symbol-ordering

如果一个函数频繁调用另一个,在linked image中如果让两个函数所在的input sections接近,可以增大它们落在同一个page的概率,减小page working set及减少TLB thrashing。参见Karl Pettis and Robert C. Hansen的 Profile Guided Code Positioning

这个选项是LLD特有的。gold有一个--section-ordering-file,根据section name排序。实践中要求text/data sections具有不同的名字(不可使用clang -funique-section-names)。 而基于符号名排序则可以使用-funique-section-names

--cref

输出cross reference table。对于每一个non-local符号,输出定义的文件和被引用的文件列表。

-M and -Map=file

输出link map,可以查看output sections的地址、文件偏移、包含的input sections。

Warning相关

--fatal-warnings

把warnings转成errors。Warning和error的差别除了是否包含warningerror字串外更重要的一点是,error会阻止输出链接结果。

--noinhibit-exec

把部分errors转成warnings。注意不要指定--fatal-warnings把降级的warnings再升级为errors:)

--build-id=value

生成.note.gnu.build-id,标识一个链接结果。一般用SHA-1。链接器会给.note.gnu.build-id的区域填零,散列每个字节后把结果填回.note.gnu.build-id。 每个链接器用的计算方式各有不同。

--compress-debug-sections=zlib

用zlib压缩输出文件的.debug_* sections,并标记SHF_COMPRESSEDSHF_COMPRESSED是合并入ELF specification的最后一个feature,之后ELF specification就处于不被维护的状态……

--hash-style

--hash-style=sysv指定ELF specification定义的DT_HASH,一个用于加速符号解析的hash table。 DT_GNU_HASH在空间占用和效率都优于DT_HASH。 指的一提的是Mips有个DT_MIPS_XHASH(Mips ABI设计聪明反被聪明误的好例子),我个人觉得在解决一个错误的问题。实际上有办法用DT_GNU_HASH,但可能Mips社区的人觉得东西塞进去了就不想多管了。

--no-ld-generated-unwind-info

参见PR12570 .plt has no associated .eh_frame/.debug_frame

PC在PLT entry中时,如果链接器不合成.eh_frame信息,unwinder可能会无法正确unwind。 在i386和x86-64上,lazy binding状态下,一个PLT entry的首次调用会执行push指令。在ESP/RSP改变后,如果PLT entry没有.eh_frame提供的unwind信息,unwinder可能会无法正确unwind,影响profiler精度。

jmp *got(%rip)
pushq $0x0
jmpq .plt

However, I think this feature is obsoleted and irrelevant nowadays. To recognize the PLT name, a profiler needs to do:

  • Parse the .plt section to know the region of PLT entries
  • Parse .rel[a].plt to get R_*_JUMP_SLOT dynamic relocations and their referenced symbol names.
  • If the current PC is within the PLT region, parse nearly instructions and find the GOT load. The associated R_*_JUMP_SLOT identifies the symbol name.
  • Concatenate the symbol name and @plt to form foo@plt

Note: foo@plt is a convention used by some tools, but it is not a name in the symbol table.

GDB有heuristics可以识别这种情况。

这个问题不会影响C++ exception。PLT entry是tail call,__cxa_throw调用的_Unwind_RaiseException会穿透ld.so resolver和PLT entry的tail calls。 PC会还原为PLT entry的caller的下一条指令。

// b.cc - b.so
void ext() { throw 3; }

// a.cc - exe
#include <stdio.h>

void ext();
void foo() {
try {
ext(); // PLT entry
} catch (int x) {
printf("%d\n", x);
}
}

int main() {
foo();
}

-O

优化等级,和编译器driver选项-O不同。

在LLD中,-O0禁用SHF_MERGE的常量合并;-O2启用SHF_MERGE|SHF_STRINGS的string suffix merge,--compress-debug-sections=zlib使用较高压缩比的zlib压缩。

-plugin file

GNU ld和gold支持这个选项加载GCC LTO插件(liblto_plugin.so)或LLVM LTO插件(LLVMgold.so)

插件的API接口由binutils-gdb/include/plugin-api.h定义。

注意,LLVMgold.so的名称含gold,但也能用于GNU ld、nm和ar。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK