Anti-Virus and Intrusion Detection Systems could become really nasty during a penetration test. They are often responsible for unstable or ineffective exploit payloads, system lock-downs or even angry penetration testers ;-) . The following article is about a simple AV and IDS evasion technique, which could be used to bypass pattern-based security software or hardware. It’s not meant to be an all-round solution for bypassing strong heuristic-based systems, but it’s a good starting point to further improve these encoding/obfuscation technique.
Therefore this article covers shellcode encoders and decoders in my SecurityTube Linux Assembly Expert certification series.
Random-Byte-Insertion-XOR Encoding Scheme
The encoding scheme itself is actually quite easy. The idea is to take a random byte as the base for a XOR operation, and to chain the next XOR operation based on the result of the previous. The same goes for the 3rd and 4th byte. The following flow-graph quickly describes what’s happening during the encoding process:
First of all (before step #1 is performed), the encoder splits the input shellcode into multiple blocks with a length of 3 bytes each and adds a random byte (value 0x01 to 0xFF) at the beginning of each of those blocks, so that these random bytes differ from block to block. If the shellcode is not aligned to these 3 byte-blocks, an additional NOP-padding (0x90) is added to the last block.
During the second step, the encoder XORs the first (the random byte) with the second byte (this is originally the first byte of the shellcode) and overwrites the second byte with the XOR result. The third step takes the result from the first XOR operation and XORs it again with the third byte, and the last step does the same and XORs the result of the previous XOR operation with the last byte of the block.
This results in a completely shredded-looking piece of memory :-)
The Python Encoder
Let’s have a quick look at the Python-based encoder:
This script generates a NASM-compatible shellcode output and takes care of specified bad chars, which could break the entire exploit. So this example reuses the shellcode from my SLAE assignment #1, which simply binds a shell to port 1337. The script generates the following encoded shellcode with the lack of 0x00 bytes as defined by the “badchars” list:
The Shellcoder Decoder
To revert (decode) the encoded shellcode to its original form, a decoder stub is placed in front of the encoded shellcode, which reads and decodes the shellcode in memory and afterwards executes it. A simple way to revert it looks like the following:
By XORing the different bytes with each other and removing the trailing byte, you can get back to the original shellcode. Now let’s get to the really interesting part: the decoder stub implementation in assembly!
First of all, the register layout will look like the following:
EAX: First operand of each XOR operation
EBX: Second operand of each XOR operation
ECX, EDX: Loop counter
ESI: Pointer to encoded shellcode
EDI: Pointer to decoded shellcode
To actually work with the encoded shellcode, one register (ESI) needs to point to its memory address. The length is needed too, but is referenced at a later pointer, so I’ll skip this explanation for a moment. To get the address, the jmp-call-pop technique is used:
After the POP and MOV instructions, the registers ESI and EDI are pointing to the encoded shellcode and additionally the pointer is also PUSHed onto the stack to be able to easily execute the shellcode in the last step.
Now some of the registers should be cleaned up. But remember: Since we’re dealing with just the lower byte of EAX (AL) and EBX (BL), there is no need to clean out these register, which saves a few bytes:
Next: the actual decoding function. The XOR operation is based on AL and BL because we’re only XORing one byte at a time. The first byte of the encoded shellcode (ESI) is MOVed into AL, and the next byte (ESI+1) is put into BL as the second XOR-operand:
After successfully setting up AL and BL, they can be XORed and the resulting byte, which is stored in AL is MOVed to EDI. Since EDI points to the encoded shellcode address, we’re actually overwriting the encoded shellcode with the decoded shellcode in place to save a lot of memory space:
After this memory write operation, the different counters and pointers are increased to prepare XORing the next block.
Since the encoder splits up the shellocode into 3-byte blocks and inserts a random value at the beginning thus resulting in 4-byte sized blocks, the decoder needs to do the same: splitting up the encoded shellcode into 4-byte blocks. This can be achieved by using a simple CMP-JNE loop based on ECX register around the XOR instruction set:
This means that if CL equals 3 (thus three XOR operations have been performed), the decoder is ready to take the next 4-byte block by increasing ESI again. This means “jumping” over the last byte of the previous block, because the decoder needs to start with the random byte again. Additionally adding 0x4 to the outer-loop counter EDX and comparing it to the length (len) of the shellcode makes sure that the decoder reaches the end of the encoded shellcode at some point properly without running into a SISEGV:
Both loops are therefore used to step through all 4 byte-blocks of the encoded shellcode and build the decoded shellcode live at the same memory location. Dou you remember the first PUSH instruction? This instruction leads to the following finalization of the decoder, which calls the decoded shellcode:
So the here’s the complete assembly decoder:
Let’s get to some live demonstrations:
Executing the Shellcode on Linux
Linking, objdumping and compiling can be done using my scripts from my github repository:
By using GDB, you can verify that the decoder does its work properly. At the beginning of the decoding process ESI points to the encoded shellcode:
And at the end, where the shellcode is actually called by the “call [esp]” instruction, the shellcode was decoded correctly:
Therefore the final shellcode execution is successful:
OK, proven working :-) At this point you may already have noticed that the encoded shellcode size has been nearly doubled by the encoding process, this could become important when you only have a very limited amount of space for your shellcode available.
Real World Cross-Platform Execution of the Shellcode
The first Linux example is quite easy because it is run in a controlled environment. Now let’s use this decoder in a real-world exploit example. You may already know one of my previous exploits in the Easy File Management Web Server, where I used a customized ROP-Exploit to pop a calc.exe. Let’s modify this exploit to demo my custom encoder/decoder.
First of all, a shellcode without any previous applied encoding scheme is needed. Luckily msfvenom will do this job for us:
This outputs a plain shellcode:
If you would use this shellcode in the exploit, it would break the entire attack, because there are a lot of bad chars inside like 0x00. Let’s encode this shellcode first, but you need to make sure, that the list of bad characters (“badchars”) is properly setup with all forbidden bytes - in case of this target: 0x00, 0x0a, 0x0b and 0x3b:
My script outputs the encoded shellcode free of all defined bad characters:
After adding the decoder stub, it even got a little bit bigger:
Now the shellcode is ready to be used in my official exploit:
By using Immunity Debugger you could easily trace the shellcode decoding process. First of all ESI points to the encoded shellcode again:
And at the point of the actual shellcode execution at “call [esp]”, ESP points to the original, decoded shellcode:
This finally results in a popping calc.exe in Easy File Management Webserver (again) :-)
This completes another SLAE mission!
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: