fix: re-enable code hook after import dispatch#290
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request modifies the Windows emulator in speakeasy/windows/winemu.py to re-enable code hooks within the handle_import_func method after processing a call return. This ensures that subsequent instructions can correctly unmap sentinel ranges, preventing adjacent sentinel calls from executing bytes in the reserved emulation memory instead of being trapped as imports. I have no feedback to provide as there were no review comments.
|
great report @g0mxxm is there any chance you can share the sample/its hash so i can 1. verify myself, and 2. add it as a regression test? |
|
see test case in #291 |
|
thanks @g0mxxm! |
Sorry for the delayed response — I've been quite busy recently and wasn't able to reply in time. I reviewed the test case in #291 — the way you captured this class of issues with a single regression test is very clever, and it's something I learned a lot from. When I submitted #290, I was mainly focused on fixing the immediate crash and hadn't yet thought about the issue from the emulator runtime/regression coverage perspective. Now I'm really curious to find the original sample and confirm whether it was also SEH handling that triggered the issue during emulation. I'll take a look next week through my previous records and environments to see if I can locate it, though I'm not sure if the VM snapshot I used for the original analysis still exists. If I'm able to recover the sample or related traces, I'll reply here with the details. Thanks again for the test case — it really helped me understand the issue more deeply. |
|
no worries! it seemed like hitting the bug took a somewhat unusual scenario with a specific ordering of IAT vs manually resolved API calls, so the test case was as "simple" as i could get, even though it's still somewhat convoluted. i'm not surprised that this scenario presents itself in packed programs, and i'm really happy you raised the issue. thank you! |
|
I found the sample that triggered the crash — it is a malicious program packed with a modified/custom UPX packer. Due to certain restrictions, I cannot share the sample directly, but I can provide detailed crash information along with the corresponding assembly snippet from the sample. Strangely, today I re-tested the same sample against the pre-fix code but was unable to reproduce the crash. Based on the JSON report saved at the time, the crash occurred during the dynamic import resolution phase of the packed stub. The stub resolves imports in a loop — here is the relevant disassembly: 0x14042d12f mov al, [rdi] ; read next API name / ordinal descriptor
0x14042d131 inc rdi
0x14042d134 or al, al
0x14042d136 jz short loc_14042D10F
0x14042d144 mov rcx, rdi ; API name pointer
0x14042d147 mov rdx, rdi
0x14042d14a dec eax
0x14042d14c repne scasb ; skip over string
0x14042d14e mov rcx, rbp ; hModule = kernel32 base
0x14042d151 call qword ptr cs:14042E274h ; GetProcAddress
0x14042d157 or rax, rax ; return point
0x14042d15c mov [rbx], rax ; write runtime IAT slot
0x14042d15f add rbx, 8
0x14042d163 jmp short loc_14042D12F ; continue resolving next APIThe 95th logged API event, The crash JSON report confirms this — note "error": {
"type": "invalid_fetch",
"pc": "0xfeee2fd2",
"instr": "add byte ptr [rax], al",
"address": "0xfeee3000",
"access_type": "fetch",
"regs": {
"rsp": "0x00000000012fff60",
"rbp": "0x0000000077000000",
"rip": "0x00000000feee2fd2",
"rsi": "0x0000000140001000",
"rdi": "0x000000014042a6a2",
"rax": "0x00000000feedf300",
"rbx": "0x0000000140141360",
"rcx": "0x0000000077000000",
"rdx": "0x000000014042a68a"
},
"address_region": {
"tag": "unknown",
"base": "0xfeee3000",
"size": "0x1000",
"prot": "rwx"
}
}I'm not entirely sure what exactly happened — some mysterious timing factor perhaps. The same sample did not crash again when executing the same path, so the original crash appears to have depended on a specific runtime state window that is hard to reproduce deterministically. |
|
thanks @g0mxxm for the update. please don't hesitate to reach out if you encounter and other strange behaviors |
Summary
Speakeasy v2 can leave the import-sentinel reserved range mapped after returning from an API/import dispatch. When the next sentinel address is executed, Unicorn may treat bytes in the reserved
0xfeee...range as normal executable code instead of trapping back into Speakeasy's import handler.Crash record
The failure I observed ended with:
Immediately before the crash, execution had already gone through dynamically resolved import sentinels in the
0xfeedf...range. The crash PC being inside the reserved sentinel area suggests the reserved range was still mapped and executable when control reached a later sentinel.Why this looks wrong
The sentinel range should normally be used as a trap mechanism for imported API calls. A call into a sentinel address should cause an unmapped-fetch/import-dispatch path, not execute bytes in the reserved region as real instructions.
In this failing run, the emulator instead executed inside the reserved range until it reached the boundary at
0xfeee3000, where the invalid fetch occurred.Suspected fix
Re-enabling the code hook after successful API/import dispatch appears to restore the expected behavior. With that change, the next instruction can pass through the code-hook path and the sentinel range can be unmapped again before a following sentinel call.
A PR with the proposed fix is here:
#290
Validation
With the proposed patch applied:
The same packed-PE emulation path also completed without the
0xfeee3000invalid fetch:Thanks for maintaining Speakeasy and for reviewing this report.