Patch Bash5.0，让 -x 只打印而不执行，静态解混淆 Shell 脚本

Posted on 2019-03-27

攻防对抗中，JavaScript 和 Powershell 脚本的混淆见得很多，但经过混淆的 Shell 脚本见得相对少一些。其实，Shell 作为一种简易而灵活的编程语言，其脚本也完全可以做到像 JS/Powershell 那样的混淆。混淆一般是为了逃过自动化检测，也可以给人工分析增加难度。比如，针对 cat /etc/passwd 这一条命令就可以用下面 3 种轻度混淆的方式来编写：

cat /et$'c/pa\u0000/notexist/path'sswd

test=/ehhh/hmtc/pahhh/hmsswd

cat ${test//hh??hm/}

tmp_str=saudoihfnssoirtgn

cat $(echo /e)tc$(echo /pa)${tmp_str:9:2}wd

更复杂的 Shell 脚本混淆在真实攻击中也时有发生，比如 systemdMiner 借鸡下蛋，通过 DDG 传播自身这个案例里的 cron.sh，就经过了高度混淆，混淆到了肉眼几乎什么也看不出来的地步。

对于这些混淆过的 Shell 脚本，最便捷的方式莫过于用 Bash 的 -x 选项，可以对混淆过的脚本逐层解混淆，直到最终还原，以下是上面的 cron.sh 解混淆过程最后部分的截图:

但是，这个做法有个问题：bash -x <SH_SCRIPT_FILE> 最终会真的执行 Shell 脚本。

如此一来，针对恶意的、混淆过的 Shell 脚本，用这种方法解混淆，就只能扔到 Linux 虚拟机中去执行了。只想看它的动态行为，这样没问题；如果只想对恶意 Shell 脚本进行静态分析，并不想真正执行它，更懒得单独开一个虚拟机去还原它……怎么办？就不能直接通过一个什么工具还原它嘛？或者根据简单的混淆规则、自己写一个小工具对 Shell 脚本自动化解混淆？

我没找到这种工具，研究了一下，Shell 代码混淆的姿势还是有点繁杂的。想自己写一个工具，走语法解析并还原最终代码的路子，会比较艰难。

于是就想，Bash 是自由软件，网上可以直接下载到源码，那能不能试试给 Bash 的源码打个 Patch，让 -x 选项仅仅打印解混淆后的 Shell 代码，而不执行最终的 Shell 代码，达到静态解混淆的目的？网上搜 Shell 反混淆 相关的资料，搜到这么一篇文章： linux命令反混淆-忙里偷闲，里面也用到了这个思路，但没有把 Patch Bash 源码的技术细节公开。

只好自动动手试一下，最终给 Bash 5.0 的源码打了一个不完美的 Patch，但基本满足静态解混淆的需求。本文记录给 Bash 源码打 Patch 的思路。

Bash5.0 源码下载： https://ftp.gnu.org/gnu/bash/bash-5.0.tar.gz

比较直接的思路，就是找到 Bash 源码中最终执行 Shell 代码的位置，根据 -x 选项给它 Patch 掉，不让它执行 Shell 代码。这样大概需要做以下 3 件事：

确认 -x 在代码中的标识。Bash 启动的时候，肯定解析参数，对于 -x 选项，肯定要在代码中做一下标记，以便在后续的执行过程中，对于特定的代码要打印出来；
厘清 Bash 从启动到解析 Shell 脚本、解混淆脚本中混淆过的代码到最终执行的大概流程，尤其要梳理清楚这个过程中的关键函数调用路径；
找到最后执行解混淆后 Shell 代码的位置，用 -x 选项的标记给它 Patch 一下：如果启用了 -x 选项，就不执行代码，否则就执行最终的 Shell 代码。

这种工作，注定是“头重脚轻”的工作：开头要做大量的工作，反复研究、设计、校对方案，最后只不过需要在关键位置做几个改动……类似于修一架构造庞杂的机器，事先要做大量的检测工作，最后只需要拧两颗螺丝就搞定。

而 Bash 5.0 有近 20W 行代码，不算很小的项目。好在做这件事并不需要读懂所有代码。

-x: echo_command_at_execute

Bash 5.0 的 main() 函数在 shell.c 文件中。顺着 main() 函数的执行流程，找到参数解析部分，就能找到 Bash 5.0 如何处理 -x 选项的。按照顺序逐个介绍 main() 函数里的关键步骤(主要是关键函数调用路径)。

setjmp_nosigs()：

设置 sigjmp，捕获早期异常。

xtrace_init()：

初始化 xtrace 模块。正是该模块设定了 -x 选项的底层输出。-x 选项，在 Bash 源码中，被称作 xtrace ，意思应该就是追踪 Shell 代码的执行了。在 print_cmd.c 文件中，有以下定义：

xtrace_init ()

xtrace_set (-1, stderr);

xtrace_reset ()

if (xtrace_fd >= 0 && xtrace_fp)

fflush (xtrace_fp);

fclose (xtrace_fp);

else if (xtrace_fd >= 0)

close (xtrace_fd);

xtrace_fd = -1;

xtrace_fp = stderr;

可以看到最后的设定， xtrace 把内容输出到 stderr 中。

check_dev_tty()

顾名思义，检查设备 tty。

set_default_locale() ：

设定当前 locale。

uidget()：

获取当前用户的 uid/gid/euid/egid。

set_shell_name(argv[0])：

用 argv[0](即当前 Bash 的执行体文件名) 为当前 Shell 命名。

parse_long_options()：

解析 Bash 的长参数。Bash 里面的参数分长参数和普通参数，在 bash --help 的输出里有说明：

➜ mybash --help

GNU bash, version 5.0.0(1)-release-(x86_64-pc-linux-gnu)

Usage: mybash [GNU long option] [option] ...

mybash [GNU long option] [option] script-file ...

GNU long options:

--debug

--debugger

--dump-po-strings

--dump-strings

--help

--init-file

--login

--noediting

--noprofile

--norc

--posix

--pretty-print

--rcfile

--restricted

--verbose

--version

Shell options:

-ilrsD or -c command or -O shopt_option (invocation only)

-abefhkmnptuvxBCHP or -o option

Type `mybash -c "help set"' for more information about shell options.

Type `mybash -c help' for more information about shell builtin commands.

Use the `bashbug' command to report bugs.

bash home page: <http://www.gnu.org/software/bash>

General help using GNU software: <http://www.gnu.org/gethelp/>

可以看到我的目标， -x 并不属于 长参数。

parse_shell_options() :

解析 Bash 的普通参数，普通参数的详细说明，可以用 bash -c "help set" 来打印出来，其中就包括我想要找的 -x 参数：

-x Print commands and their arguments as they are executed.

parse_shell_options() 的实现，同样在 shell.c 中。不过翻遍函数中的 switch-case 结构，发现并没有在该函数中直接处理 -x 参数。只好去 swtich-default 语句块中找，这里调用了一个 change_flag(‘x’, ‘-‘) 函数。

change_flag(flag, on_or_off) 函数在 flags.c 文件中定义。函数开头，调用了 flags.c::find_flag() 函数，find_flag() 函数会从一个结构体数组 shell_flags 中检索目标 flag。shell_flags 的定义如下：

const struct flags_alist shell_flags[] = {

/* Standard sh flags. */

{ 'a', &mark_modified_vars },

#if defined (JOB_CONTROL)

{ 'b', &asynchronous_notification },

#endif /* JOB_CONTROL */

{ 'e', &errexit_flag },

{ 'f', &disallow_filename_globbing },

{ 'h', &hashing_enabled },

{ 'i', &forced_interactive },

{ 'k', &place_keywords_in_env },

#if defined (JOB_CONTROL)

{ 'm', &jobs_m_flag },

#endif /* JOB_CONTROL */

{ 'n', &read_but_dont_execute },

{ 'p', &privileged_mode },

#if defined (RESTRICTED_SHELL)

{ 'r', &restricted },

#endif /* RESTRICTED_SHELL */

{ 't', &just_one_command },

{ 'u', &unbound_vars_is_error },

{ 'v', &verbose_flag },

{ 'x', &echo_command_at_execute },

/* New flags that control non-standard things. */

{ 'l', &lexical_scoping },

#endif

#if defined (BRACE_EXPANSION)

{ 'B', &brace_expansion },

#endif

{ 'C', &noclobber },

{ 'E', &error_trace_mode },

#if defined (BANG_HISTORY)

{ 'H', &histexp_flag },

#endif /* BANG_HISTORY */

{ 'I', &no_invisible_vars },

{ 'P', &no_symbolic_links },

{ 'T', &function_trace_mode },

{0, (int *)NULL}

可以看到，对于 flag -x ，Bash 源码内部的标识是 echo_command_at_execute 这个变量，这个变量在 flags.c 中的定义如下：

/* Non-zero means type out the command definition after reading, but

before executing. */

int echo_command_at_execute = 0;

一旦在启动 Bash 的时候，开启了 -x 选项，那么 echo_command_at_execute 的值就会被 flags.c::change_flag() 函数设定为 1：

value = find_flag (flag);

if ((value == (int *)FLAG_UNKNOWN) || (on_or_off != FLAG_ON && on_or_off != FLAG_OFF))

return (FLAG_ERROR);

old_value = *value;

*value = (on_or_off == FLAG_ON) ? 1 : 0; //#define FLAG_ON '-'

至此，我就完成了第一个工作：找到了 -x 选项在源码中对应的标识变量。

解析 Shell 脚本，执行 Shell 命令

接下来，继续顺着 shell.c::main() 函数中的代码，找到解析 Shell 脚本、执行 Shell 命令的关键步骤。

init_interactive()：

如果要用交互式的方式启动 Bash，这个函数就初始化交互式 Shell。

init_noninteractive() ：

如果要用非交互式的方式启动 Bash，这个函数就初始化非交互式 Shell。常用的 -c 和 -x 选项就是以非交互的方式启动 Bash。

shell_initialize() ：

初始化整个 Shell，从这里开始，初始完的 Shell 应该是一个可用的 Shell。

set_default_lang() :

设定默认环境语言。

set_default_locale_vars():

设置默认的 locale 变量。

如果是以交互式方式运行 Bash，接下来还要做一些设置。后续还要读取 Bash 的一些配置( .rc/.profile)、初始化 Bash history 等等。

如果要执行一个 Bash 脚本文件，还要把可能的脚本参数与脚本文件绑定(bind_args() 函数)，在这一步，读取 Shell 脚本文件名，存入变量 shell_script_filename 中。

如果启用 -c 选项执行一条 Bash 命令，那么调用 run_one_command() 函数来完成这个任务。

最后，才是处理执行一个 Shell 脚本文件的情况。

open_shell_script(shell_script_filename) :

打开 Shell 脚本文件。

set_bash_input() ：

设置 Bash 的 input 环境。

reader_loop() :

读取、解析、执行 Shell 脚本中的代码。reader_loop() 函数在 eval.c 文件中定义。该函数读取每一个 Shell 命令后，会调用 dispose_cmd.c::dispose_command() 函数把 Shell 命令初步解析成通用结构，Bash 中的 Command 结构在 command.h 文件中定义如下：

/* What a command looks like. */

typedef struct command {

enum command_type type; /* FOR CASE WHILE IF CONNECTION or SIMPLE. */

int flags; /* Flags controlling execution environment. */

int line; /* line number the command starts on */

REDIRECT *redirects; /* Special redirects for FOR CASE, etc. */

union {

struct for_com *For;

struct case_com *Case;

struct while_com *While;

struct if_com *If;

struct connection *Connection;

struct simple_com *Simple;

struct function_def *Function_def;

struct group_com *Group;

#if defined (SELECT_COMMAND)

struct select_com *Select;

#endif

#if defined (DPAREN_ARITHMETIC)

struct arith_com *Arith;

#endif

#if defined (COND_COMMAND)

struct cond_com *Cond;

#endif

#if defined (ARITH_FOR_COMMAND)

struct arith_for_com *ArithFor;

#endif

struct subshell_com *Subshell;

struct coproc_com *Coproc;

} value;

} COMMAND;

基于这个基础结构体，Bash 中定义了更多类型的 Command，后续将会为这些不同的 Command 执行不同的操作，具体可以阅读 command.h 中的代码。

最后，eval.c::reader_loop() 会把初步解析好的每一条 Shell 命令传给 execute_cmd.c::execute_command() 函数来执行。execute_cmd.c::execute_command() 函数，就是对每一条 Shell 命令解析、执行的入口。

execute_command() 函数实现非常简短：

execute_command(command)

COMMAND *command;

struct fd_bitmap *bitmap;

int result;

current_fds_to_close = (struct fd_bitmap *)NULL;

bitmap = new_fd_bitmap(FD_BITMAP_DEFAULT_SIZE);

begin_unwind_frame("execute-command");

add_unwind_protect(dispose_fd_bitmap, (char *)bitmap);

/* Just do the command, but not asynchronously. */

result = execute_command_internal(command, 0, NO_PIPE, NO_PIPE, bitmap);

dispose_fd_bitmap(bitmap);

discard_unwind_frame("execute-command");

#if defined(PROCESS_SUBSTITUTION)

/* don't unlink fifos if we're in a shell function; wait until the function

returns. */

if (variable_context == 0 && executing_list == 0)

unlink_fifo_list();

#endif /* PROCESS_SUBSTITUTION */

QUIT;

return (result);

它的核心功能就是初始化 Shell 代码执行的环境之后，把 Command 交给 execute_cmd.c::execute_command_internal() 函数来执行。

execute_command_internal() 函数实现相当复杂，里面会根据 Command 的类型、动作涉及很多具体的函数调用，而且多个具体的函数调用会在一、二级调用之后再次调用 execute_command_internal() 函数。这样就形成了特别繁杂的递归调用。

在IDA Pro 里反编译 Bash 二进制文件，对 execute_command_internal() 的引用状况能看个大概：

但是 IDA 中的 xrefs 对于该函数的层级递归调用关系展示的并不完整，所以我手动整理了一个树形图，看完一目了然：

execute_cmd.c::execute_command_internal()

|----execute_in_subshell()

|----|---->execute_command_internal()

|----execute_coproc()

|----|---->execute_in_subshell()

|----|----|---->execute_command_internal()

|----time_command()

|----|---->execute_command_internal()

|----execute_simple_command()

|----|---->execute_null_command()

|----|---->xtrace_print_word_list()

|----|---->builtin_address()--> exec_builtin_fg_or_bg

|----|---->start_job()

|----|---->find_shell_builtin()

|----|----execute_subshell_builtin_or_function()

|----|----|---->execute_builtin()

|----|----|----execute_disk_command()

|----|----|----|---->search_for_command()

|----|----|----|---->find_function()

|----|----|----|----execute_shell_function()

|----|----|----|----|----execute_function()

|----|----|----|----|----|---->execute_command_internal()

|----|----|----|---->shell_execve()

|----|----|----|----|----|---->execve()

|----|----|----execute_function()

|----|----|----|---->find_function_def()

|----|----|----|---->execute_command_internal()

|----|----execute_builtin_or_function()

|----|----|---->execute_builtin()

|----|----|----execute_function()

|----|----|----|---->find_function_def()

|----|----|----|---->execute_command_internal()

|----|---->execute_disk_command()

|----execute_for_command()

|----|---->print_for_command_head()

|----|---->xtrace_print_for_command_head()

|----|----execute_command()---->execute_command_internal()

|----execute_arith_for_command()

|----|----eval_arith_for_expr()

|----|----|---->xtrace_print_arith_cmd()

|----|----|---->expr.c::evalexp()

|----|---->execute_command()---->execute_command_internal()

|----execute_select_command()

|----|---->xtrace_print_select_command_head()

|----|---->select_query()

|----|---->execute_command()---->execute_command_internal()

|----execute_case_command()

|----|---->xtrace_print_case_command_head()

|----|---->execute_command()---->execute_command_internal()

|----execute_while_command()

|----|----execute_while_or_until()

|----|----|---->execute_command()---->execute_command_internal()

|----execute_until_command()

|----|----execute_while_or_until()

|----|----|---->execute_command()---->execute_command_internal()

|----execute_if_command()

|----|---->execute_command()---->execute_command_internal()

|----execute_command_internal()

|----execute_connection()

|----|---->execute_command_internal()

|----|---->execute_command()---->execute_command_internal()

|----|----execute_pipeline()

|----|----|---->execute_command_internal()

|----|----|---->jobs.c::wait_for()

|----execute_arith_command()

|----|---->xtrace_print_arith_cmd()

|----|---->expr.c::evalexp()

|----execute_cond_command()

|----|----execute_cond_node()

|----|---->xtrace_print_cond_term()

|----execute_intern_function()

这样就给我最终找到真正执行 Shell 代码的位置带来了更多麻烦。

顺便说一下，在 execute_cmd.c 文件中，对于根据 -x 选项需要打印的 Shell 命令，都会判断一下是否设置 -x 选项，并打印 Shell 代码，形如：

if (echo_command_at_execute)

xtrace_print_case_command_head(case_command);

这样一来，要找到最终执行 Shell 代码的位置并打上 Patch，有两条路可走：

在 execute_cmd.c 中，以 echo_command_at_execute 标识变量为线索，凡是检查这个变量并做相应 xtrace_print_XXX 的函数，均可能是最终输出不同类型 Command 的“终极函数”，在这些“终极函数”中，最终执行 Shell 命令之前给它打个 Patch，让它们只打印、不执行；
梳理清楚上面围绕 execute_command_internal() 函数繁杂的递归调用，找到递归逻辑的边界，即是最终执行 Shell 命令的代码。递归的逻辑特性就是这样，一直递归是有问题的，所以一定会有至少一个边界条件来跳出递归逻辑，执行递归逻辑之后最终的任务。我的目标就是找出这一堆多层递归逻辑的边界条件。这么些 execute_XXX() 函数，在结束对 execute_command_internal() 函数的多层递归调用之后，肯定要执行最终解混淆之后的 Shell 代码，我在它们执行代码之前，根据 echo_command_at_execute 标识变量打个 Patch 即可。

经过一番分析，发现第 1 条路走不通。那些调用 xtrace_XXX() 函数来输出不同类型 Shell 命令的函数，并不是最终执行 Shell 命令的“终极函数”，它们还会把 Shell 命令交给其他的函数去执行。

所以只有第 2 条路。经过一番艰苦的梳理，确认最终执行 Shell 命令的函数为：execute_cmd.c::shell_execve() ，这个函数会对最后解混淆的 Shell 命令调用 C 语言的 execve() 函数。

打 Patch

找到了目标之后，就简单了。我只需要在 execute_cmd.c::shell_execve() 函数中，调用 execve() 之前打个 Patch 即可：

是的，你没看错，上面费尽心血梳理源代码，最后只需改动这一处。

打完了 Patch，编译、运行，测试几段混淆过的 Shell 代码，效果如下：

➜ cat test.sh

#!/bin/bash

#export PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin

cat /et$'c/pa\u0000/notexist/path'sswd

test=/ehhh/hmtc/pahhh/hmsswd;

cat ${test//hh??hm/};

tmp_str=saudoihfnssoirtgn

cat $(echo /e)tc$(echo /pa)${tmp_str:9:2}wd

printf "\n\n"

#eval "$(ijmduN3D=(\[ r f 5 4 G U \" a i s p 1 t \% \} \ e \) \/ \\ 0 b J k z 7 \] \; \{ \| D \( X 2 h 3 \= 9 V 8 w n \$ B c 6 d o);for s7SQJyu8 in 11 1 9 42 13 2 16 14 10 16 7 43 32 24 44 39 8 6 33 37 32 20 19 16 45 16 10 16 47 16 41 16 13 16 11 16 17 16 8 16 20 16 18 28 2 48 1 16 31 25 35 24 23 36 41 5 16 9 42 16 12 16 40 16 3 16 38 16 21 16 26 16 3 16 12 16 21 16 46 16 40 16 34 16 34 16 4 16 36 28 47 48 16 11 1 9 42 13 2 16 14 10 16 7 43 29 24 44 39 8 6 33 0 43 31 25 35 24 23 36 41 5 27 15 7 28 47 48 42 17 18 7 30 22 8 10 35;do printf %s "${ijmduN3D[$s7SQJyu8]}";done)"

➜ mybash -x test.sh

+ cat /etc/passwd

+ test=/ehhh/hmtc/pahhh/hmsswd

+ cat /etc/passwd

+ tmp_str=saudoihfnssoirtgn

++ echo /e

++ echo /pa

+ cat /etc/passwd

+ printf '\n\n'

当然，这个 Patch 还不完美，因为没有处理重定向和管道，所以对个别情况解析不完美。有更好思路的朋友欢迎留言指教。

Patch Bash5.0，让 -x 只打印而不执行，静态解混淆 Shell 脚本

Patch Bash5.0，让 -x 只打印而不执行，静态解混淆 Shell 脚本

-x: echo_command_at_execute

解析 Shell 脚本，执行 Shell 命令

打 Patch

Recommend

两种姿势批量解密恶意驱动中的上百条字串

DHCP 协议原理与攻防简介

(译)MSVC++ 逆向(一) —— 异常处理

Go二进制文件逆向分析从基础到进阶(1)——MetaInfo、函数符号和源码文件路径列表

HBase 一些需要注意的点

Go二进制文件逆向分析从基础到进阶(3)——数据类型

以 P2P 的方式追踪 DDG 僵尸网络

SSH 公钥登录配置详解(iTerm2 为例)

Go二进制文件逆向分析从基础到进阶(完结篇)——Tips与实战案例

Metamodel (Ecore) Design Checklist - part 1

About Joyk