Friday, March 23, 2012

Asmjit Based Loader

The reason I'm using asmjit is because it's much better than writing inline assembly. When I used to write memory corruption exploits and shellcode, I used to have to write __asm {} blocks, compile it, look at the generated asm in a debugger/hex editor, copy the bytes, create a char buffer with the data in hex and finally do stupidly crazy unreadable indirection to call it. Like ((void (*)(void)) &shellcode)(); I still don't understand that shit. Overall, it was a very delicate and irritating process.

With asmjit I don't have to do any of that crap. Asmjit is great because it totally abstracts out how you create your instructions, gives you type safety and allows you to serialize the code into data (which I demonstrate in this post). It also contains functions on relocating addresses for when you inject into a remote process.

Asmjit exposes two objects, a compiler and an assembler. I'm not entirely sure about all of the differences, but from what I can tell the compiler seems to be an abstraction on top of the assembler. I believe it is for writing 'higher level' assembly, but in my case I want to write to registers directly because I know exactly what I want.

So what do I want? In this case I want to write a loader that can take in arbitrary dll names and function names. When I used to write shellcode I wasn't afforded one very important luxury. I was exploiting a *remote* process and had *no* idea where any addresses were. To call any win32 functions you need two things. The base address of kernel32, and a method to find the addresses of symbol names. Basically you needed to hand code your own GetProcAddress. If you're curious of an implementation, check out this oldie but goodie at http://www.harmonysecurity.com/files/kungfoo.asm. I'm pretty sure I copied/used that shellcode at somepoint in my past :).   When I first started writing the loader for this post, I was actually rewriting that block of asm in asmjit! Then I thought to myself, what the hell am I doing? I already *know* the addresses of GetProcAddress/LoadLibrary! It's the same for all local processes.

So now I'm doing local injection which gives me, well two things. First, I can easily get the base address of kernel32 using GetModuleHandle. And second, I can get the addresses of GetProcAddress and  LoadLibrary by using none other than GetProcAddress. Then, with the awesomeness of asmjit I can insert these addresses as immediate values directly into my Assembler. I can also write my data (such as the dll path and export name) directly into this Assembler by using the data() method and specifying the buffers and their size. I use a trick which is common in shellcode to jmp down to your data then call back up to your code. By doing this, the call instruction will take the next instruction and store it on the stack for you. You can then either pop it off, or use it directly in your calls. The rest of my code basically just goes through setting up the function calls and grabbing the addresses of the buffers, pretty simple really.
Injecting a DLL and calling an exported function using asmjit


Things of course will need to be changed once I move this code into SoNew as we will be injecting into a remote process.. But without further ado, here's my test loader code, enjoy!


// AsmJitTest.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
// AsmJit - Complete JIT Assembler for C++ Language.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <asmjit/asmjit.h>
#include <asmjit/memorymanager.h>

// This is type of function we will generate
typedef int (*MyFn)();

int main(int argc, char* argv[])
{
 using namespace AsmJit;
 const char *dll = "C:\\Research\\SoNew\\Debug\\SoNewTestDll.dll";
 const char *exported_function = "RunTest";

 HMODULE kernel = GetModuleHandle(L"kernel32"); // need kernel32's base address
 FARPROC load_library = GetProcAddress(kernel, "LoadLibraryA"); // need ll
 FARPROC get_proc_address = GetProcAddress(kernel, "GetProcAddress"); // heh :>.
 if (load_library == NULL) {
  printf("load_library is null: %d",GetLastError());
  return -1;
 }
 if (get_proc_address == NULL) {
  printf("get_proc_address is null: %d",GetLastError());
  return -1;
 }
 
 // ==========================================================================
 // Create Assembler.
 Assembler a;
 FileLogger logger(stderr);
 a.setLogger(&logger);
 {
  Label L_lib = a.newLabel();
  Label L_start = a.newLabel();
  Label L_funcname = a.newLabel();
  Label L_callfunc = a.newLabel();
  Label L_exit = a.newLabel();
  
  // Prolog.
  a.push(ebp);
  a.mov(ebp, esp);
  a.jmp(L_lib);      // jmp down to where our lib/dll is.
  a.bind(L_start);     // oh hai again!
  // Start.
  // just to show eax contains addr (next two calls not needed)
  //a.pop(eax);      // address of our dll.
  //a.push(eax);      // push on to stack for ll call
  a.call((sysint_t)load_library);  // load our dll
  a.cmp(eax, 0);      // module should be stored in eax.
  a.je(L_exit);      // make sure we have a valid module handle
  a.mov(edx, eax);     // store module in edx
  a.jmp(L_funcname);     // get the exported_func's address
  a.bind(L_callfunc);
  // just to show eax contains addr (next two calls not needed)
  //a.pop(eax);      // the name of our exported func
  //a.push(eax);      // push name of our exported func
  a.push(edx);      // push addr of our dll
  a.call((sysint_t)get_proc_address); // get exported_func's addr.
  a.cmp(eax, 0);      // func should be stored in eax.
  a.je(L_exit);      // if not bomb out
  a.call(eax);      // and call it!  
  // Epilog.
  a.bind(L_exit);
  a.mov(esp, ebp);
  a.pop(ebp);
  a.ret();
  // our "data" section
  a.bind(L_lib);
  a.call(L_start);
  // write our dll path as data.
  a.data(dll, strlen(dll)+1);
  a.bind(L_funcname);
  a.call(L_callfunc);
  // write our exported function name as data
  a.data(exported_function, strlen(exported_function)+1);
 }
 // This is entirely to demonstrate how we can treat the 
 // code as data. If we are going to inject into a remote
 // process we will need to relocate it differently.
 // But for local processes it gets the point across!
 size_t code_size = a.getCodeSize();
 MemoryManager *mm = MemoryManager::getGlobal();
 void *p = mm->alloc(code_size, MEMORY_ALLOC_FREEABLE);
 if (p == NULL) {
  printf("Error allocation of our code buffer returned null!");
  return -1;
 }
 void *data = a.make();
 memcpy(p, data, code_size); 
 a.relocCode(p);
 printf("Code size: %d\nNow Calling...", code_size);
 ((void (*)(void)) p)();
 MemoryManager::getGlobal()->free(p);
 // Or screw all that above noise and just cast and call.
 // MyFn fn = function_cast<myfn>(a.make());
 // fn();
 // MemoryManager::getGlobal()->free((void*)fn);

 return 0;
}

3 comments: