Binbloom blooms: introducing v2

Introduction

Reverse-engineering hardware devices usually requires extracting data from memory, be it from an internal Flash of a SoC, an external NAND or SPI flash chip. Extracting memory content is part of the job, but once done we still need to analyze it and face the inevitable truth: we may be in front of an unknown memory dump or just have no idea of how information is stored in it, how it is loaded into the SoC or MCU memory and more generally where we can find interesting data and code. If you are into MCU/SoC firmware reverse-engineering this should sound familiar, as embedded Linux or other operating systems mostly rely on filesystems that can be identified and recovered with well-known tools.

These firmwares are strongly tied to a specific architecture that uses a given processor with its own peripherals and communication buses, with its own characteristics and specificities, making reverse-engineering a tedious task. This information may be found in the architecture documentation, when available. As a matter of fact, we need dedicated tools to quickly find some specific information before loading a firmware into our preferred disassembler:

architecture endianness, because it is better to know how values are stored in memory (and by the way how instructions are decoded);
the base address at which the firmware content is loaded (if the firmware is not a collage of various blocks of data and code).

Moreover, it could also be interesting to automatically detect interesting structures or arrays of structures such as the ones used to store Unified Diagnostic Services message IDs and related functions addresses for instance (these structures are very common in automotive ECU firmwares).

Guessing endianness

The endianness refers to the way integer values are stored in memory: least-significant byte first (little-endian) or most-significant byte first (big-endian, also known as network byte order). Guessing the endianness of an unknown firmware is not straightforward, but most of the existing tools consider these two options and try to determine which one gives the best results. There is no real alternative to this approach, and results are usually pretty good. Moreover, if you know the architecture your firmware is supposed to run on then you may know what endianness it supports (or not, e.g. ARM processors that handle both). Anyways, it is no big deal to figure out which one is used.

Finding a firmware base address

A firmware is usually mapped at a specific address in memory, depending on the architecture and its configuration. It could be loaded by a bootloader and stored at a particular address in RAM, or even be transparently mapped in memory and accessed through a dedicated bus. Supposing we do not know this address, how would we guess it based on what we have? We can only rely on information stored in the firmware, and based on this we would determine the most probable loading address.

Most of the existing tools like rbasefind, basefind.py, basefind.cpp, or even binbloom v1 try to find valuable data in the content of a firmware, such as text strings or pointers, and use them to recover the base address with more or less success. These methods will be detailed later in this blog post, as well as their pros and cons. The fact is we have tools that are able to guess or recover the base address of a given firmware, unless you have to deal with a 64-bit architecture such as AArch64 or there is no text strings in it. There is no magical tool, and the ones we use also have some flaws and limitations.

Issues and limitations

These tools cannot handle 64-bit firmwares because they were not designed to support them. They are also heavily dependent on the type of data stored inside the firmware, since it is the only input they can use to guess the corresponding base address. You have a firmware with no text strings and a few kilobytes of data? Don't expect too much, as a statistical analysis performed on a few kilobytes may not produce any reliable output.

The way pointers are determined by these tools is also a weakness, especially when a firmware contains more data than code. In this case, some 32-bit values may be considered as valid pointers whereas they only belong to some data stored in the firmware, thus introducing a bias in any statistical analysis and eventually leading to the wrong base address.

Nevertheless, the existing tools work pretty well for most of the 32-bit firmware files and memory dumps extracted from usual devices (well-known architecture used with well-known compiler). They are able to find one or more potential base addresses in most of the cases.

Introduction

Guessing endianness

Finding a firmware base address

Issues and limitations

Recommend

从 CloudWeGo 谈云原生时代的微服务与开源

如果你在大西洋遇见一艘无人驾驶的船，它可能真不是「幽灵船」

Galaxy Z Flip 3 Review: An iPhone user’s perspective on the foldable Android pho...

pgp

Cortana’s voice actor Jen Taylor ‘shed tears’ during Halo Infinite’s emotional m...

多地消费刺激政策利好袭来，提振消费电子产业链信心

不为人知的网络编程(十一)：从底层入手，深度分析TCP连接耗时的秘密

Add Build Info to an 11ty Site

睡前看短视频还是阅读书籍？

今日好价 0601

About Joyk