0

The Log4j mess

 2 years ago
source link: https://lwn.net/Articles/878390/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Log4j mess

Posted Dec 12, 2021 21:02 UTC (Sun) by roc (subscriber, #30627) [Link]

Given that the vulnerability is apparent once you understand the feature --- why wasn't it identified earlier? Seems like lots of people had enough information to identify the problem. Interesting food for thought.

The Log4j mess

Posted Dec 13, 2021 1:48 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

It's a low-level library that normally "just works", and people use it without giving it a second thought. It's not even similar to something like OpenSSL, because it doesn't deal with Internet-facing complicated protocols.

The Log4j mess

Posted Dec 12, 2021 21:34 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

As I understand it, it's also very clumsy engineering. Log4j doesn't, even now, seem to understand the difference between the log message format and the formatted log message.

When this "feature" still existed by default in the library, their documentation (I looked at the Wayback Machine entry in November) seems pretty cheerful about it, even extolling the virtues of recursive lookups since hey, it's just a string, we can process it in a loop...

It's tempting when over-engineering something - as will inevitably happen for a second system like log4j2 was - to make everything that could possibly vary into a variable. Why not right? But the lesson we've learned in decades of Software Engineering is that constraint is actually good - yes I want to be able to make everything a variable if I need to, including whether everything is a variable, but that should not be the default, the default should be something much less free-form, needing me to explicitly opt into ever crazier Alice-in-wonderland APIs when and if I need them, which usually I won't.

Imagine if 99.9% of the world's log4j usage was calling log("Only this format is special", these, parameters, are, not, parsed); Unable to change constant strings like "Only this format is special" and "Username {} not present in database" lots of possible attack paths are stopped dead.

Sure, the 0.1% of the world that actually needed log_special(all, these, parameters, are, parsed); or log_array(array_of_parameters) is in a panic to fix their code, and it is at least possible that some of this 0.1% is exposed somewhere it really shouldn't have been, but the people being exposed at least got some value for what they risked.

The Log4j mess

Posted Dec 12, 2021 23:33 UTC (Sun) by bartoc (subscriber, #124262) [Link]

I think you are mostly correct here. It's the same insight that tcl has with respect to it's substitution behavior (it never double expands), and the same bug that makes bash injection so easy (unless you turn off this somewhat surprising behavior by setting IFS to nul).

I don't think splitting things into a format string + arguments form is necessary to prevent stuff like this though, only that that form makes the problematic behavior seem much more insane than doing it in the "everything is one string" form.

Consider SQL injection: "SELECT ${some_variable} FROM my_table;". The problem here is not that the replacement ("${some_variable}") is inside the string instead of after it (passed as a parameter) but that there's two parsers that process the string one after another, the language's "string interpolation" parser and then the actual SQL parser. The first parser eliminates the information that "${some_variable}" is one unique "thing" that should be expanded verbatim, so the second one has no way of knowing not to parse its content as commands.

If the logging "formatter" and the string interpolation "formatter" are actually the same library then this need not occur, the parser can just expand the replacement and do no other work. This does mean you need to say "log("${user_string}")" instead of "log(user_string)" though, so having the printf style API may still be less error prone (note that you do need the ability to directly parse and log runtime format specifications because it's useful for localization).

My preference is: never double expand, and require a special opt-in for passing non-constant format specification strings.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK