Skip to content

fix: re-enable code hook after import dispatch#290

Merged
williballenthin merged 1 commit into
mandiant:masterfrom
g0mxxm:fix-import-sentinel-code-hook
May 20, 2026
Merged

fix: re-enable code hook after import dispatch#290
williballenthin merged 1 commit into
mandiant:masterfrom
g0mxxm:fix-import-sentinel-code-hook

Conversation

@g0mxxm

@g0mxxm g0mxxm commented May 19, 2026

Copy link
Copy Markdown

Summary

Speakeasy v2 can leave the import-sentinel reserved range mapped after returning from an API/import dispatch. When the next sentinel address is executed, Unicorn may treat bytes in the reserved 0xfeee... range as normal executable code instead of trapping back into Speakeasy's import handler.

Crash record

The failure I observed ended with:

invalid_fetch
pc = 0xfeee2fd2
instr = add byte ptr [rax], al
fetch address = 0xfeee3000
address_region = unknown [0xfeee3000-0xfeee3fff], prot=rwx
thread_id = 1076
process_id = 1056

Immediately before the crash, execution had already gone through dynamically resolved import sentinels in the 0xfeedf... range. The crash PC being inside the reserved sentinel area suggests the reserved range was still mapped and executable when control reached a later sentinel.

Why this looks wrong

The sentinel range should normally be used as a trap mechanism for imported API calls. A call into a sentinel address should cause an unmapped-fetch/import-dispatch path, not execute bytes in the reserved region as real instructions.

In this failing run, the emulator instead executed inside the reserved range until it reached the boundary at 0xfeee3000, where the invalid fetch occurred.

Suspected fix

Re-enabling the code hook after successful API/import dispatch appears to restore the expected behavior. With that change, the next instruction can pass through the code-hook path and the sentinel range can be unmapped again before a following sentinel call.

A PR with the proposed fix is here:

#290

Validation

With the proposed patch applied:

uvx ruff check          -> passed
uvx ruff format --check -> passed
python -m pytest tests/ -> 90 passed, 1 skipped

The same packed-PE emulation path also completed without the 0xfeee3000 invalid fetch:

ret_val = 0x0
error = None
events = 516

Thanks for maintaining Speakeasy and for reviewing this report.

@google-cla

google-cla Bot commented May 19, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the Windows emulator in speakeasy/windows/winemu.py to re-enable code hooks within the handle_import_func method after processing a call return. This ensures that subsequent instructions can correctly unmap sentinel ranges, preventing adjacent sentinel calls from executing bytes in the reserved emulation memory instead of being trapped as imports. I have no feedback to provide as there were no review comments.

@williballenthin

Copy link
Copy Markdown
Collaborator

great report @g0mxxm

is there any chance you can share the sample/its hash so i can 1. verify myself, and 2. add it as a regression test?

@williballenthin

Copy link
Copy Markdown
Collaborator

see test case in #291

@williballenthin williballenthin merged commit 9b42ace into mandiant:master May 20, 2026
3 checks passed
@williballenthin

Copy link
Copy Markdown
Collaborator

thanks @g0mxxm!

pull Bot pushed a commit to A-Archives-and-Forks/speakeasy that referenced this pull request May 20, 2026
@g0mxxm

g0mxxm commented May 23, 2026

Copy link
Copy Markdown
Author

see test case in #291

Sorry for the delayed response — I've been quite busy recently and wasn't able to reply in time.

I reviewed the test case in #291 — the way you captured this class of issues with a single regression test is very clever, and it's something I learned a lot from. When I submitted #290, I was mainly focused on fixing the immediate crash and hadn't yet thought about the issue from the emulator runtime/regression coverage perspective. Now I'm really curious to find the original sample and confirm whether it was also SEH handling that triggered the issue during emulation. I'll take a look next week through my previous records and environments to see if I can locate it, though I'm not sure if the VM snapshot I used for the original analysis still exists.

If I'm able to recover the sample or related traces, I'll reply here with the details. Thanks again for the test case — it really helped me understand the issue more deeply.

@williballenthin

Copy link
Copy Markdown
Collaborator

no worries!

it seemed like hitting the bug took a somewhat unusual scenario with a specific ordering of IAT vs manually resolved API calls, so the test case was as "simple" as i could get, even though it's still somewhat convoluted. i'm not surprised that this scenario presents itself in packed programs, and i'm really happy you raised the issue. thank you!

@g0mxxm

g0mxxm commented May 25, 2026

Copy link
Copy Markdown
Author

I found the sample that triggered the crash — it is a malicious program packed with a modified/custom UPX packer. Due to certain restrictions, I cannot share the sample directly, but I can provide detailed crash information along with the corresponding assembly snippet from the sample.

Strangely, today I re-tested the same sample against the pre-fix code but was unable to reproduce the crash. Based on the JSON report saved at the time, the crash occurred during the dynamic import resolution phase of the packed stub. The stub resolves imports in a loop — here is the relevant disassembly:

0x14042d12f  mov al, [rdi]                 ; read next API name / ordinal descriptor
0x14042d131  inc rdi
0x14042d134  or al, al
0x14042d136  jz short loc_14042D10F
0x14042d144  mov rcx, rdi                  ; API name pointer
0x14042d147  mov rdx, rdi
0x14042d14a  dec eax
0x14042d14c  repne scasb                   ; skip over string
0x14042d14e  mov rcx, rbp                  ; hModule = kernel32 base
0x14042d151  call qword ptr cs:14042E274h  ; GetProcAddress
0x14042d157  or rax, rax                   ; return point
0x14042d15c  mov [rbx], rax                ; write runtime IAT slot
0x14042d15f  add rbx, 8
0x14042d163  jmp short loc_14042D12F       ; continue resolving next API

The 95th logged API event, GetProcAddress("SetConsoleCursorPosition") returned a sentinel address (0xfeedf324). The old code likely did not re-enable the code hook after this call returned, causing Speakeasy's reserved/sentinel region to not be restored to a trap path in time. Unicorn subsequently treated the 0xfeee... reserved pages as normal code and continued executing, ultimately triggering an invalid_fetch at 0xfeee3000 near the 60-second timeout.

The crash JSON report confirms this — note rip is executing inside the sentinel range (0xfeee2fd2) and the access target is 0xfeee3000:

"error": {
    "type": "invalid_fetch",
    "pc": "0xfeee2fd2",
    "instr": "add byte ptr [rax], al",
    "address": "0xfeee3000",
    "access_type": "fetch",
    "regs": {
        "rsp": "0x00000000012fff60",
        "rbp": "0x0000000077000000",
        "rip": "0x00000000feee2fd2",
        "rsi": "0x0000000140001000",
        "rdi": "0x000000014042a6a2",
        "rax": "0x00000000feedf300",
        "rbx": "0x0000000140141360",
        "rcx": "0x0000000077000000",
        "rdx": "0x000000014042a68a"
    },
    "address_region": {
        "tag": "unknown",
        "base": "0xfeee3000",
        "size": "0x1000",
        "prot": "rwx"
    }
}

I'm not entirely sure what exactly happened — some mysterious timing factor perhaps. The same sample did not crash again when executing the same path, so the original crash appears to have depended on a specific runtime state window that is hard to reproduce deterministically.

@williballenthin

Copy link
Copy Markdown
Collaborator

thanks @g0mxxm for the update. please don't hesitate to reach out if you encounter and other strange behaviors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants