0

lld 16 ELF changes

 1 year ago
source link: https://maskray.me/blog/2023-03-19-lld-16-elf-changes
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

lld 16 ELF changes

llvm-project 16 was just released. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/16.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.

  • Link speed improved greatly compared with lld 15.0. Notably input section initialization and relocation scanning are now parallel. (D130810) (D133003)
  • ELFCOMPRESS_ZSTD compressed input sections are now supported. (D129406)
  • --compress-debug-sections=zstd is now available to compress debug sections with zstd (ELFCOMPRESS_ZSTD). (D133548)
  • --no-warnings/-w is now available to suppress warnings. (D136569)
  • DT_RISCV_VARIANT_CC is now produced if at least one R_RISCV_JUMP_SLOT relocation references a symbol with the STO_RISCV_VARIANT_CC bit. (D107951)
  • DT_STATIC_TLS is now set for AArch64/PPC32/PPC64 initial-exec TLS models when producing a shared object.
  • --no-undefined-version is now the default; symbols named in version scripts that have no matching symbol in the output will be reported. Use --undefined-version to revert to the old behavior. (D135402)
  • -V is now an alias for -v to support gcc -fuse-ld=lld -v on many targets.
  • -r no longer defines __global_pointer$ or _TLS_MODULE_BASE_.
  • A corner case of mixed GCC and Clang object files (STB_WEAK and STB_GNU_UNIQUE in different COMDATs) is now supported. (D136381)
  • The output SHT_RISCV_ATTRIBUTES section now merges all input components instead of picking the first input component. (D138550)
  • For x86-32, -fno-plt GD/LD TLS models call *___tls_get_addr@GOT(%reg) are now supported. Previous output might have runtime crash.
  • Armv4(T) thunks are now supported. (D139888) (D141272)

Speed

Link speed has greatly improved compared to lld 15.0.0.

In this release cycle, I made input section initialization and relocation scanning parallel. (D130810 D133003)

% hyperfine --warmup 1 --min-runs 20 "numactl -C 20-27 "{/tmp/out/custom15,/tmp/out/custom16}"/bin/ld.lld @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 676.3 ms ± 4.4 ms [User: 751.5 ms, System: 472.3 ms]
Range (min … max): 667.9 ms … 686.1 ms 20 runs

Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 453.5 ms ± 5.3 ms [User: 760.5 ms, System: 448.9 ms]
Range (min … max): 445.4 ms … 462.8 ms 20 runs

Summary
'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
1.49 ± 0.02 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'

(--threads=4 => 1.38x (0.7009s => 0.5096s))

Linking a -DCMAKE_BUILD_TYPE=Debug build of clang:

% hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.664 s ± 0.039 s [User: 6.657 s, System: 3.434 s]
Range (min … max): 3.605 s … 3.781 s 25 runs

Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.141 s ± 0.028 s [User: 7.071 s, System: 3.150 s]
Range (min … max): 3.086 s … 3.212 s 25 runs

Summary
'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
1.17 ± 0.02 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'
(--threads=1 => 1.06x (7.620s => 7.202s), --threads=4 => 1.11x (4.138s => 3.727s))

Linking a default build of chrome:

% hyperfine --warmup 2 --min-runs 25 "numactl -C 20-27 /tmp/out/custom"{15,16}"/bin/ld.lld @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 4.159 s ± 0.018 s [User: 6.703 s, System: 2.705 s]
Range (min … max): 4.136 s … 4.207 s 25 runs

Benchmark 2: numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.700 s ± 0.027 s [User: 7.417 s, System: 2.556 s]
Range (min … max): 3.660 s … 3.771 s 25 runs

Summary
'numactl -C 20-27 /tmp/out/custom16/bin/ld.lld @response.txt --threads=8' ran
1.12 ± 0.01 times faster than 'numactl -C 20-27 /tmp/out/custom15/bin/ld.lld @response.txt --threads=8'
(--threads=1 => )

Share Comments


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK