Reverse engineering code can be complicated enough, but what can make things really weird is when the instructions you see in code being called do not execute as you expect. Today we will be going over a toy example of self-mutating code and some techniques for creating mutating code that prevents instructions from being easily determined statically.
Function overwrites are fun, and they really lead to some complications when it comes to static analysis of binary objects. To get straight to the “pointer”, a function is nothing more than a pointer that points to the beginning of a set of instructions that you can also overwrite, however, this generally causes problems, and in a lot of languages such as Python or Java, trying to overwrite a function isn’t even allowed by traditional methods. However, C/C++ are perfect candidates for this, as they really don’t care what you do when it comes to writing to a pointer.
Below we can see a function called swapFunctionWithShellCode wrote in C++ (win32). In this function, we can see two function pointers, funcPtr1, and funcPtr2, in which we can see both are being assigned to point to function1. Next, the function changeAddressSpacePermissions is called. Within this function, essentially what we are doing is setting the location in memory where the instructions lie for function1 to be writable, otherwise since by default instruction locations are marked read and execute only, we wouldn’t be able to overwrite this location in memory, as the kernel would stop us due to permissions. The reason we have two pointers that both point to the same function being passed to this function is to aid in the size calculations.
In the next bit of code, we can see our shellcode payload which just returns the function cleanly, allowing the program to continue to execute. Finally, we perform a memcpy to overwrite our funcPtr1 with our shellcode, so now its instructions have changed at runtime, then we call our mutated function.
Our Overwrite Function
Now let’s take a look at this PE in Ghidra to see why this makes things complicated to statically analyze. In the example below, on line 53 we can see memcpy being called with the second argument SHELLCODE. For simplicity reasons I changed the name of this local variable to make it easier to see, however, due to compiled binaries lacking symbol tables, you generally won’t have a variable name sticking out like this to you as the reverse engineer. While analyzing this statically, we can see local_10 which points to a function is being over-written, it isn’t initially something that may stand out.
When we further analyze what the SHELLCODE local variable is pointing to, we can see the following:
Shellcode in Ghidra
As expected, the shellcode just looks like random data in our binary. This isn’t exactly clear that it’s shellcode, as we hoped for. If a reverse engineer was to take this out and check for shellcode, they would indeed find it to be shellcode, but what if it was encrypted? Not exactly the cakewalk to determine what exactly this data is, however the function overwrite itself should be a red flag, as these are not normal to see in binary executables.
How do anti-virus products perform against this? After all, with function overwrites being a red flag, we would hope they’d detect this and it would trigger a warning.
Windows Defender Scan Results
Windows Defender didn’t see a problem with our binary, and in fact, most vendors didn’t seem to have a problem with it when uploaded to Virus Total.
Virus Total Scan Results
While we see 11 AV products marked our binary as malicious, we have to remember quite often people will assume that because it’s only 11, that it is a false positive.
Can we do better? Next post we will be going over a way to lower this detection even more and to prevent our shellcode from being analyzed statically, making the behavior of our binary executable impossible to determine through static analysis methods.