2

glibc and DT_GNU_HASH

 2 years ago
source link: https://maskray.me/blog/2022-08-21-glibc-and-dt-gnu-hash
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

glibc and DT_GNU_HASH

tl;dr "Easy Anti-Cheat"'s incompatibility with glibc 2.36 is an instance of Hyrum's law.

glibc 2.36 was released on 2022-08-02. On 2022-08-03 Jelgnum reported that with the new glibc, "Easy Anti-Cheat" cannot load the anti-cheat module (GLIBC update broke EAC for most games that use it). Frogging101 bisected the issue to the glibc commit Do not use --hash-style=both for building glibc shared objects. The issue led to heated discussions, some clickbait news, and claims such as "glibc breaks ABI" and "glibc does not prioritize compatibility with pre-existing applications".

I feel compelled to demystify the story and wish that people can stop defamation to glibc.

Root cause

Carlos O'Donell provided a great summary in a reply to the libc-alpha thread "Should we make DT_HASH dynamic section for glibc?" on 2022-08-08.

I do not use the game software, so my reasoning about "Easy Anti-Cheat" is based on others' information. The glibc commit dropped a compiler driver option -Wl,--hash-style=both when linking glibc provided shared objects (e.g. libc.so.6, libpthread.so.0). Many Linux distributions have configured their GCC to pass --hash-style=gnu to the linker. In the absence of -Wl,--hash-style=both, the linker produces a .gnu.hash section and a DT_GNU_HASH tag and suppresses .hash and DT_HASH. The glibc commit does not change how a user executable/shared object is linked.

Apparently "Easy Anti-Cheat" does something similar to a dynamic loader (rtld), likely that it does some symbol lookup. It requires that every(?) loaded shared object has the DT_HASH tag. There is no DT_GNU_HASH support. When the software comes to a glibc libc.so.6 without DT_HASH, it reports an error.

Note: "Easy Anti-Cheat"'s reliance on DT_HASH was noticed by Gentoo users back in 2022-04 (https://github.com/anyc/steam-overlay/issues/309).

What is DT_HASH?

DT_HASH is a dynamic tag specified by the System V Application Binary Interface (generic ABI). It is used by a dynamic loader to perform symbol lookup (for dynamic relocation and dlsym family functions). ELF: symbol lookup via DT_HASH has a great description of the format.

In "Figure 5-10: Dynamic Array Tags, d_tag", the generic ABI says that DT_HASH is mandatory in an executable or shared object. I will talk about this later.

What is DT_GNU_HASH?

In 2006, glibc commit 871b91589bf4f6dfe19d5987b0a05bd7cf936ecc added DT_GNU_HASH support (in the old days, commit messages only said "what a commit did", not the motivation.) The 2006-06 thread [PATCH] DT_GNU_HASH: ~ 50% dynamic linking improvement has some discussion. A 2006-10 message GNU_HASH section format describes the format. Unfortunately DT_GNU_HASH is not specified in a more official document.

For a curious reader who doesn't want to learn the history, just read ELF: better symbol lookup via DT_GNU_HASH.

Ali Bahrami's The Cost Of ELF Symbol Hashing has described the advantages of DT_GNU_HASH over DT_HASH:

  • An improved hash function is used, to better spread the hash keys and reduce hash chain length.
  • The dynamic symbol table is sorted into hash order, such that memory access tends to be adjacent and monotonically increasing, which can help cache behavior. (Note that the Solaris link-editor does a similar sort, although the specific details differ.)
  • The dynamic symbol table contains some symbols that are never looked up by via the hash table. These symbols are left out of the hash table, reducing its size and hash chain lengths.
  • Perhaps most significantly, the GNU hash section includes a Bloom filter. This filter is used prior to hash lookup to determine if the symbol is found in the object or not.

The bloom filter size is configurable. In ld.lld's setting, the produced DT_GNU_HASH is almost always smaller than DT_HASH. If something like Solaris direct bindings is leveraged which mostly eliminates unsuccessful symbol lookup, we can make the bloom filter size to 1 to remove the overhead.

Nowadays DT_GNU_HASH is pretty much universal among ELF operating systems.

DT_GNU_HASH transition

DT_GNU_HASH is superior to DT_HASH in almost all aspects except the slight implementation complexity. The nice thing is that the transition is mostly transparent. As long as ld and rtld support the format, we can use it. (Well, I will soon talk about exceptions: programs may poke into the rtld/libc internal, reimplement symbol lookup but do not support DT_GNU_HASH, and therefore make the transition not smooth.)

GNU ld made a transition to a --hash-style=both default. Some Linux distributions carried local patches to make GCC pass --hash-style=gnu to ld, so that most pieces of software used the format. E.g. Fedora Core 6 (released in 2006-10) made the switch. I saw a 2007 Gentoo post about using --hash-style=gnu. I haven't made a thorough review but it appears that the majority of Linux distributions have switched to --hash-style=gnu in 201x.

In 2011, install.texi (Configuration): Document --with-linker-hash-style. added a configure option --with-linker-hash-style= which was then adopted by distributions.

Generally there are two categories of reasons that --hash-style=gnu cannot be used.

  • ABI flaw
  • Custom dlsym implementation which only supports DT_HASH

MIPS cannot use DT_GNU_HASH because it sorts .dynsym in a different way (for a technique called IRIX Quickstart, which AFAIK never has an implementation on other operating systems) which is incompatible with DT_GNU_HASH's sorting requirement. See All about Global Offset Table for detail.

mumble used to rely on DT_HASH. DT_GNU_HASH support was added in https://github.com/mumble-voip/mumble/commit/6f19d7ebfd7565843b3c56484af624afb5956c0f and https://github.com/mumble-voip/mumble/commit/9d3e53152a8df4059aeae9a00a3bbe438a4c56c0. libstrangle relied on DT_HASH: https://gitlab.com/torkel104/libstrangle/-/issues/59.

Some reliance is really about whether reimplementing dlsym is necessary. If we assume that it is necessary (in some cases): they work if the software build system specifies --hash-style=sysv or --hash-style=both to override the distribution default LDFLAGS and make sure they don't need DT_HASH from their shared object dependencies.

What happens if glibc libc.so.6 drops DT_HASH? The software almost assuredly use libc.so.6 (on a Linux glibc system) and will likely break. This is an obvious instance of Hyrum's law to me:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

glibc rtld continues supporting DT_HASH in user executables and shared objects but it decides to leave its own shared objects (e.g. libc.so.6, libpthread.so.0) to the GCC default. I don't think the presence of DT_HASH is ever provided as a contract. This is a clear internal detail which isn't supposed to be relied upon by user programs.

Discussions

Is DT_HASH deprecated on Linux?

On every architecture except MIPS, DT_HASH has been de facto deprecated. Fedora's --hash-style=gnu transition in 2006 made DT_HASH executables and shared objects extremely rare. libc.so.6 does contain DT_HASH for a long time, but it is just a rare exception. Other distributions quickly caught up and DT_HASH was mostly extinct for similarly long years.

Is omitting DT_HASH conforming to the generic ABI?

In general a processor supplement ABI or an operating system ABI can replace a generic ABI feature, and we should not read too much from the generic ABI wording. When DT_GNU_HASH is shipped as a replacement, omitting the replaced feature DT_HASH is totally fine. glibc libc.so.6 has the OSABI value ELFOSABI_GNU. Nevertheless, it is worth discussing how DT_GNU_HASH fits the generic ABI and ELFOSABI_NONE.

In the glibc case, if you read Do not use --hash-style=both for building glibc shared objects, you will probably agree that this was a nice code clean-up.

Is DT_HASH optional in the generic ABI?

If one reads much from the generic ABI wording, it says "mandatory", and therefore it is not optional. Does this make sense?

Technically a dynamic loader does not need a hash table to perform symbol lookup. It can start at the dynamic symbol table beginning specified by DT_SYMTAB, and scan to the end. Wait, in the absence of DT_HASH (DT_GNU_HASH is an extension, we want a way without an extension), there is no reliable way to get the number of dynamic symbol table entries. I tend to think this is outside of the generic ABI's business to require something. An ELF object can freely use an extension to provide the information. Specifying things in such a verbatim way is not ELF's spirit. Michael Matz disagrees in a reply to "Making DT_HASH optional?".

Should DT_GNU_HASH upgrade ELFOSABI_NONE to ELFOSABI_GNU?

Ali Bahrami holds this opinion while Roland McGrath and I disagree. Roland's argument is that ELFOSABI_GNU is for extensions like STB_GNU_UNIQUE and STT_GNU_IFUNC, not for extra non-standard DT_* tags.

DT_GNU_HASH predates e_ident[EI_OSABI]/ELFOSABI_* and belongs to a generic range (outside of [DT_LOOS,DT_HIOS] and [DT_LOPROC,DT_HIPROC]). Cary Coutant proposed that we can retroactively add DT_GNU_HASH to the generic ABI and Ali Bahrami objected to the proposal.

Note: SHT_GNU_HASH belongs to a OS-specific range. If DT_GNU_HASH were accepted, we probably needed to find a new value in a generic range.

There is a related issue that Linux has used ELFOSABI_NONE for GNU specific things for many years. E.g. Just using GNU symbol versioning does not upgrade ELFOSABI_NONE to ELFOSABI_GNU. Very few features use ELFOSABI_GNU as an indicator and later ELFOSABI_LINUX is defined as an alias for ELFOSABI_GNU. The OSABI values can technically facilitate different systems running non-native objects. In reality this interoperability isn't done very smoothly.

For Linux and many BSD systems, we are now on an interesting land:

  • We use GNU and LLVM toolchains.
  • Many features are provided for all ELF operating systems.

We do not have ELFOSABI_GNUBASE or ELFOSABI_LLVM. Forcing a OSABI value can be regarded as imposing (in some sense) unnecessary inconvenience.

If we really want to force an e_ident[EI_OSABI] value, what should we do? Cross compilation and build reproducibility is highly appreciated nowadays. For a linker command line, using different e_ident[EI_OSABI] values on different systems is a bad practice. Technically we can let the compiler driver pass -m emulation to ld and let ld set e_ident[EI_OSABI] according to the emulation. As a linker maintainer, I think this is inconvenient and unnecessary, when the produced ELF object files are quite homogenous on many systems. As a new OS developer, such distinction is unnecessary, too.

DT_SYMTABSZ or DT_SYMTAB_COUNT

The second word in a DT_HASH hash table is nchain, which equals the number of dynamic symbol table entries. People agree that a direct way obtaining the number will be great. We can add DT_SYMTABSZ to the generic ABI. In practice ELF consumers want to know the number of entries, not the size of the symbol table, so DT_SYMTAB_COUNT will be more convenient.

The argument favoring DT_SYMTABSZ is precedents such as DT_PLTRELSZ, DT_RELASZ, DT_RELSZ.

Could the accident have been detected earlier?

On the software side

"Easy Anti-Cheat" developers probably missed the fact that on many Linux distributions, most executables and shared objects do not have .hash/DT_HASH for a long time.

On the user side

Gentoo users noticed the issue back in 2022-04 (https://github.com/anyc/steam-overlay/issues/309). Sam James worked around "Easy Anti-Cheat"'s reliance on DT_HASH with sys-libs/glibc: re-enable DT_HASH. This wasn't widely aware. Upstream glibc happened to subsequently made a different but with a similar behavior change, essentially dropping DT_HASH from glibc provided shared objects.

"Easy Anti-Cheat" is proprietary and IMO niche. It is probably popular among the gaming community but isn't that common taking account of the whole Linux glibc community.

All in all, the issue was identified quickly and the distribution developers quickly worked around the issue by adding back DT_HASH to glibc provides shared objects. Many thanks to their hard work.

On 2022-08-11, Frederik Schwan pushed re-add DT_HASH to glibc shared objects removed in 2.36 for Arch Linux.

Things for the operating systems using GNU and LLVM toolchains to sort out

Use Linux x86-64 as an example (for other processors, just check out the relavant psABI (processor supplement ABI)). We have these ABI documents:

None of the documents specifies DT_GNU_HASH. Having a specification will provide a great value. The document shall also state that DT_HASH is optional.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK