How to effectively detect and mitigate Trojan Source attacks in JavaScript codeb...

On November 1st, 2021, a public disclosure of a paper titled Trojan Source: Invisible Vulnerabilities described how malicious actors may employ unicode-based bidirectional control characters to slip malicious source code into an otherwise benign codebase. This attack relies on reviewers confusing the obfuscated malicious source code with comments.

What is a Trojan Source attack?

Traditional code editors and code review practices miss detecting bidirectional characters present in source code. This allows actors to inject malicious code that looks benign. This vulnerability was made public on November 1st, 2021 and assigned CVE-2021-42574.

The following is a snippet from VS Code of a Trojan Source attack as is employed in a JavaScript source code:

// running internal logic for privileged users:
var accessLevel = "user";
if (accessLevel != "user‮ ⁦// Check if admin⁩ ⁦") {
    console.log("You are an admin.");
}

What about now with this screenshot:

A Trojan Source Stretched Strings attacked implemented in JavaScript

Did you catch the issue with the above source code? If not, try examining that code snippet a little closer.

Here’s what’s happening — this is a case of a Stretched String type of attack. The code in line 3 makes it looks like the conditional expression checks whether the accessLevel variable is equal to the value of user. There’s some comment at the end of the line about logic checks, and it may look harmless but the truth is quite different.

In fact, the use of unicode bidirectional characters on line 3 hides the actual string value of the accessLevel variable check. Here is the real line 3 as the compiler would run it:

If (accessLevel != "user // Check if admin") {

The paper describes several types of abusing bidirectional control characters to inject malicious code into source: Commenting-Out, Stretched String, Invisible Functions, and Homoglyph Function. The researchers have provided JavaScript examples of all of these attacks being employed via this trojan-source repository on GitHub.

Although the use of bidirectional control characters is a novel approach, this sort of attack isn’t actually new and has been cited in prior mailing lists and discussion boards. For example, some references are this Golang issue from back in 2017 about disallowing RTL/LTR characters, or even this Bugzilla entry from 2011, titled [BiDi] Misleading display of bidirectional strings when RLO, LRO or PDF is used (note the use of Google Cache to access it).

How do you fix Trojan Source attacks?

The authors of the academic paper suggest that the issue lies with code editors and IDE software that should be fixed to make such unicode characters visually visible, as well as compilers that should warn users against it.

Detecting Trojan Source attacks in source code

Your code editing and code review processes may be on platforms or tools that do not support highlighting of these dangerous bidirectional unicode characters. This means you may already have those bidirectional characters in your codebase.

So how do you find out if you have source code with bidirectional unicode characters? To help with that, I created an npm package called anti-trojan-source that scans a directory, or reads input from standard input (STDIN) and scans it for any such unicode characters that may be present in the text.

You can use npx to scan files as follows:

npx anti-trojan-source --files='src/**/*.js'

Or if you’d like to use it as a library in a JavaScript project:

import { hasTrojanSource } from 'anti-trojan-source'
const isDangerous = hasTrojanSource({
  sourceText: 'if (accessLevel != "user‮ ⁦// Check if admin⁩ ⁦") {'
})

Preventing Trojan Source attacks in JavaScript with ESLint

But even better than just finding existing issues is to proactively safeguard your codebase to ensure that no Trojan Source attacks make their way into your source code at all. In the JavaScript community, we often rely on ESLint and its various plugins to enable control code quality and code style standards.

And so, with the use of eslint-plugin-anti-trojan-source, now you can also include an ESLint plugin to make sure that none of your developers or continuous integration and build systems are mistakenly merging code that is potentially malicious due to bidirectional unicode characters.

Here is an example ESLint configuration for a JavaScript project:

"eslintConfig": {
    "plugins": [
        "anti-trojan-source"
    ],
    "rules": {
        "anti-trojan-source/no-bidi": "error"
    }
}

And an example output for a vulnerable snippet of code that slipped into the codebase:

$ npm run lint

/Users/lirantal/projects/repos/@gigsboat/cli/index.js
  1:1  error  Detected potential trojan source attack with unicode bidi introduced in this comment: '‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only '  anti-trojan-source/no-bidi
  1:1  error  Detected potential trojan source attack with unicode bidi introduced in this comment: ' end admin only ‮ { ⁦'                    anti-trojan-source/no-bidi

/Users/lirantal/projects/repos/@gigsboat/cli/lib/helper.js
  2:1  error  Detected potential trojan source attack with unicode bidi introduced in this code: '"user‮ ⁦// Check if admin⁩ ⁦"

How is the ecosystem mitigating Trojan Source attacks?

IDEs such as VS Code have released versions to highlight these unicode characters so programmers would take note of them and act with proper context when reviewing and editing code. Similarly, GitHub published warnings so that visualized code bases will now highlight the use of these potentially dangerous trojan on GitHubs if they use bidirectional characters:

GitHub highlights warnings about potential dangerous source code injected into a code base through a Trojan Source attack and bidirectional unicode characters

Source: https://github.com/nickboucher/trojan-source/blob/main/JavaScript/stretched-string.js

However, note that not all types of trojan malware attacks are being highlighted by GitHub. For example, consider the following case that the paper represents and dubs Invisible Functions:

Employing Invisible Functions Trojan Source attacks which the GitHub UI fails to warn about

As you can see in the JavaScript code snippet above, there aren’t any warnings from GitHub when reviewing this code. What’s actually going on there?

The function declaration on line 7 is actually written with the use of a zero-width space unicode control character identified as U200B, which makes it look visually as if this is a case of a legitimate function isAdmin() function.

We can verify this if we print out the code using a tool like bat, which is a clone of the UNIX cat tool, with better syntax highlighting and Git integration:

Should compilers and runtimes mitigate Trojan Source attacks?

What about compilers and language runtimes? Most languages, including Node.js, have decided against updating their compiler from denying unicode characters. Effectively transitioning the risk to code editors and humans who need to be more careful when reading code and performing code review processes.

That said, some language runtimes like Zig have positively considered to employ a compiler error when detecting the use of unicode bidirectional characters in source code, and allow to bypass the errors with an explicit comment.

Resources on Trojan Source attacks

I hope you found this post useful for understanding these Trojan Source attacks and how they can appear in them JavaScript ecosystem. To learn more about these attacks, I recommend checking out the following resources:

What is a Trojan Source attack?

How do you fix Trojan Source attacks?

Detecting Trojan Source attacks in source code

Preventing Trojan Source attacks in JavaScript with ESLint

How is the ecosystem mitigating Trojan Source attacks?

Should compilers and runtimes mitigate Trojan Source attacks?

Resources on Trojan Source attacks

Recommend

Spring Authorization Server 快速入门

又一场泡沫盛宴：刚量产就华丽上市，Rivian市值逼近通用

Snyk IaC wins 2021 CRN Tech Innovator Award & continues to grow channel busi...

多家公司售假被查！跨境电商不能再野蛮生长！

小红书的种草基因，大厂都想抢

喝茅台吃火锅，巴奴的“神话”还能延续多久?

Spring Authorization Server 的过滤器链

市场要闻｜日本科技巨头东芝将拆分上市，退出中国市场系误传

在日本，为什么修改游戏存档可能违法

吐槽几句双11

About Joyk