Mitigating DLL Injection Through Load Order Hijacking

--[ 1. Introduction

The Windows process model is remarkably unrestricted. From an engineering
perspective, this openness affords considerable flexibility in your system
design. Why _wouldn't_ you want to throw together a quick and dirty IPC
mechanism using window messages, or inject new code directly into a running
process? However, it is this same flexibility that is often abused by rogue
software developers or attackers to subvert applications into performing actions
which were not originally intended by the developer. These techniques are
commonly used for things like cheating in video games, bypassing software
licensing restrictions, and for process and behavioral monitoring in personal
security products (Antivirus software).

But this flexibility comes at a significant cost to software vendors and end
users. In the case of window messaging, the technique was so heavily abused by
hackers in so-called "Shatter Attacks" that with the release of Windows Vista,
Microsoft produced one of their most effective attack mitigations to date; User
Interface Privilege Isolation (UIPI), effectively preventing processes from
communicating with those running at higher integrity level. This change to the
Windows security model effectively wiped out an entire class of Privilege
Escalation attacks overnight.

But one facet of the Windows process model that Microsoft has shown less willing
to eradicate is win32's WriteProcessMemory and CreateRemoteThread APIs. The
latter of which effectively allows one process to instantiate a new thread of
execution within the context of another. Over the years, there have been
modifications to the behavior of these APIs, notably the inability to interact
with processes running in a different terminal session. However, the vast
majority of Windows system software runs in the primary session at Medium IL.
And herein lies the problem.

--[ 2. DLL Injection 101

Writing to memory and creating a thread is fundamentally sufficient in of itself
to execute code within the context of another process. And indeed, simply
allocating a chunk of Read/Write/eXecute (RWX) memory, writing code to it, and
starting execution at that address would suffice (modulo Arbitrary Code Guard
mitigations). However, in practicality this is an exceptionally time consuming
way to build and maintain a working feature. It is much more common to simply
let the Windows loader do most of the heavy lifting for you. It is for this
reason that pretty much every invocation of CreateRemoteThread simply points to
LoadLibrary with a pointer to the path of a DLL to load. And this whole flow is
entirely facilitated by a significant weakness in the Windows implementation of
Address Space Layout Randomisation (ASLR) which means DLL base addresses are not
randomised across processes.

Therefore, achieving code injection generally follows the following pattern:

 [1] processHandle = OpenProcess(PROCESS_ALL_ACCESS, ..., PID)
 [2] dllPath = VirtualAllocEx(processHandle, ..., PAGE_READWRITE)
 [3] WriteProcessMemory(processHandle, dllPath, PATH_TO_DLL)
 [4] hKernel32 = GetModuleHandleW("Kernel32")
 [5] loadLibrary = GetProcAddress(hKernel32, "LoadLibrary");
 [6] CreateRemoteThread(processHandle, ..., loadLibrary, dllPathAddr, ...);

First, we call OpenProcess[1] to obtain a handle to the process into which we
want to inject. We wil use this handle to interact with the process throughout
the rest of this flow. We then need to allocate a buffer within the process's
address space which will contain the string to the path to the DLL. We achieve
this by making a call to VirtualAllocEx[2] (the "Ex" version allowing for the
provision of a target process handle in the first argument), then subsequently
calling WriteProcessMemory[3] with the newly allocated buffer in the second
parameter and the DLL path in the third.

Next we need to ascertain the address of the LoadLibrary routine. It is exported
from kernel32.dll. Fortunately, due to the aforementioned limitations of Windows
ASLR, if we load kernel32 into our own address space, we are guaranteed that it
will be loaded at the exact same base address in the target. As such, we simply
use GetProcAddress[5] to obtain the address. Finally, we can call
CreateRemoteThread[6] with the address of LoadLibrary in the "lpStartAddress"
parameter and the address of the dll path string in "lpParameter".

Developers can use the fact that the Windows loader will first make a call to
the library's initialisation routine DllMain upon completion of the load to
perform the required logic. The syntax of DllMain is documented on MSDN as:

     BOOL WINAPI DllMain(
 [1]   _In_ HINSTANCE hinstDLL,
 [2]   _In_ DWORD     fdwReason,
       _In_ LPVOID    lpvReserved
     );

DllMain is called with the handle to the current DLL instance (the current base
address) in parameter hinstDLL[1], and the reason that DllMain is being invoked
in fdwReason[2]. The latter is important because DllMain is invoked by the
loader when the DLL is first loaded (after dynamic linking is completed), when
the DLL is unloaded, and also at the creation and destruction of threads (to
allow the DLL to initialise some thread-local data). Typically DLLs used in
injection make us of the first of these; DLL_PROCESS_ATTACH.

For brevity, this article will mostly cover usermode DLL injection, but it is
important to note that DLLs can also be injected from the Kernel. Indeed,
kernelmode injection is more common in products such as AVs.

--[ 3. LdrRegisterDllNotification

As a medium integrity process running on a Windows device on the default
session, there is very little that you can do to prevent such injection attacks
(and indeed even less that you can do to prevent attacks from Kernel). However,
a scantly-documented API exists hidden in the depths of ntdll called
LdrRegisterDllNotification. This routine (designed for use in usermode drivers
incorporating KDM) allows a usermode application to register a callback every
time the loader is invoked to map a new DLL into memory.

Once registered, the loader will make a call to the specified callback on the
newly created thread once the new DLL has been mapped into memory, but before
DllMain is called. The specified callback will then have the opportunity to
perform some action. At the most basic level, the callback routine can
 a) Check that this is not a dll that is required under normal operation (it will
    get called at load time of _all_ DLLs, including Windows platform
    libraries) then
 b) Simply call TerminateThread on the current thread to prevent it from
    continuing execution into the library's DllMain.

However, this introduces a race condition. The application needs to call
LdrRegisterDllNotification early enough in it's startup that the injector cannot
begin the process of creating a remote thread before the callback is registered.
And, even worse, injections performed from the Kernel are often triggered as a
result of a callback registered through PsSetCreateProcessNotifyRoutine(Ex)
which is called in the context of the newly created process's main thread
_before_ the main routine is called. Making it impossible to win the race by
registering the callback in main.

--[ 4. DLL Load Order Hijacking

So now we are in a position where we need to register for a callback _before_
we get a chance to execute any code in main.

Fortunately, due to a quirk in the Windows loader, we actually get that chance
if we perform the call to LdrRegisterDllNotification in a DLL loaded immediately
(not delay-loaded) by the exe. The DllMain routine of each DLL is called
immediately after dynamic linking is completed, and before the call to the
executable's entry point (technically a shim added by the compiler and runtime
which initialises the standard library, gathers arguments and calls main, but
for the sake of simplicity we will refer to this whole bootstrapping as main).

Crucially, even Kernel callbacks registered through
PsSetCreateProcessNotifyRoutine do not get notified until after dynamic linking
is completed. And it is for this reason, that we can effectively mitigate many
instances of basic DLL injection through the use of a tiny DLL which registers a
LdrRegisterDllNotification callback in DllMain which simply calls
TerminateThread.

This technique does not account for the (not uncommon) model of injecting a DLL
with multiple exports. The first thread is a simple call to LoadLibrary as seen
in the flow in section 2, and a second call to CreateThread is used to call into
one of the exported functions. For this reason, our technique can be further
improved by parsing the Export Address Table in the newly injected DLL, finding
the offsets of all exported functions, and patching them all with a single-byte
Return (RET) instruction. This way, when the DllMain and other routines are
called, they simply return harmlessly without executing their payload.

--[ 5. Sample code

The following is a sample DLL which performs the technique described above.
(Truncated for brevity)

  VOID CALLBACK
  OnLoad(
    _In_     ULONG             NotificationReason,
    _In_     PLDR_DLL_NOTIFICATION_DATA  NotificationData,
    _In_opt_ PVOID             Context
  )
  {
    if (NotificationReason != LDR_DLL_NOTIFICATION_REASON_LOADED) {
      return;
    }

    ...

    ULONG exports_dir_va =
        nt_header->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    PIMAGE_EXPORT_DIRECTORY exports_dir =
        (PIMAGE_EXPORT_DIRECTORY)((PBYTE)base_address + exports_dir_va);
    PDWORD names =
        (PDWORD)((PBYTE)base_address + exports_dir->AddressOfNames);
    
    for (DWORD i = 0; i < exports_dir->NumberOfNames; i++) {
      PDWORD address_of_name_ordinals =
        (PDWORD)((PBYTE)base_address + exports_dir->AddressOfNameOrdinals);
      WORD name_ordinal = address_of_name_ordinals[i];
      PDWORD address_of_functions =
        (PDWORD)((PBYTE)base_address + exports_dir->AddressOfFunctions);
      PBYTE function =
        (PBYTE)((PBYTE)base_address + address_of_functions[name_ordinal]);
      
      DWORD oldProtect;
      BOOL success = VirtualProtect(
        function,
        1, 
        PAGE_EXECUTE_READWRITE, 
        &oldProtect
      );
      if (success)
      {
        *function = 0xc3;
        VirtualProtect(
          function, 
          1, 
          oldProtect, 
          &oldProtect
        );
      }
    }
  }

  BOOL APIENTRY
  DllMain(
    _In_   HMODULE hModule,
    _In_   DWORD  ul_reason_for_call,
    _In_opt_ LPVOID lpReserved
  )
  {
    HMODULE ntdll;
    LdrRegisterDllNotificationPtr ldrRegisterDllNotification;
    LPWSTR commandLine;

    if (ul_reason_for_call == DLL_PROCESS_ATTACH)
    {
      ntdll = LoadLibraryW(L"ntdll.dll");
      if (ntdll == NULL)
      {
        return TRUE;
      }

      ldrRegisterDllNotification = GetProcAddress(
        ntdll,
        "LdrRegisterDllNotification"
      );
      if (ldrRegisterDllNotification == NULL)
      {
        return TRUE;
      }

      ldrRegisterDllNotification(
        0,
        &OnLoad,
        NULL,
        &g_Cookie
      );
    }

    return TRUE;
  }

--[ 6. Limitations

As previously mentioned, the Windows process model is exceptionally forgiving,
and while the mitigation outlined above will function for the vast majority of
DLL injections in use today, it is not infallible. Other processes are entirely
free to reach in and modify process memory, and can still create threads at
will. There is nothing to stop the injector simply resetting the return
instruction back to it's original value, or even simply allocating some
executable memory and jumping to it.

The truth of the matter is, that it is effectively impossible to prevent DLL
injection, especially from Kernel mode, and any mitigation that we put in place
is merely a speed bump to a determined attacker. However, it is certainly
possible to make the task more difficult by loading your process with
mitigations such as this in order to prevent from applications with less
conviction.

--[ 7. Resources

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createremotethread
https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain
https://ferreirasc.github.io/PE-Export-Address-Table/