Skip to content

nbaertsch/COMegon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

image

COMegon

Dual-mode API dispatch and sleep framework that routes and queues arbitrary Windows API calls through COM/RPC internals. Similar to thread-pool proxying but with the ability to queue calls as RPC messages, allowing for sleep-time encryption without the NtContinue/CONTEXT pattern.

What It Does

COMegon leverages the COM runtime's own internal dispatch machinery to execute API calls on your behalf. From the OS perspective, every call originates from combase.dll β†’ rpcrt4.dll β†’ ntdll.dll. COM is all vtables. COMegon recreates the necesary COM data structures to provide controlled vtable lookup and execution. COM itself manages the nuances of +4 stack arguments, and return value capture is native to the COM runtime, no trampolines or hacks as is required when capturing results from thread pool proxied-calls.

Modes

Sleep Mode (Batch Fire-and-Forget)

Queue multiple API calls, then execute them all in a single pump cycle. The calling thread's fiber is suspended while dispatch occurs on a dedicated pump fiber whose stack contains only system frames.

Use case: Sleep encryption chains β€” queue VirtualProtect β†’ SystemFunction032 β†’ NtDelayExecution β†’ SystemFunction032 β†’ VirtualProtect, then sleep. Your .text section can be encrypted because no user code exists on any active callstack. Implemented example is naive regarding heap encryption, but the primitives are available to you.

Proxy Mode (Synchronous Dispatch)

Execute a single API call and retrieve the result. The call is dispatched through the same COM channel but returns synchronously to the caller.

Use case: Runtime stack smuggling β€” any sensitive API call gets a clean system-only stack that looks like COM-isms. The 'pump' execution is initiated via fiber so you'll see that at bottom of stack.

Detection Testing

CLSID Configuration

COMegon registers a COM proxy/stub factory via CoRegisterClassObject using a CLSID that must not be present in the host process's registry (HKCR\CLSID). If the CLSID matches a registered COM class, COM's PS factory resolution loads the real DLL instead of our in-process factory, breaking initialization.

COMegon's COM-based sleep primitive was tested against four open-source beacon hunting and memory scanning tools during active sleep/wake cycling (5 Γ— 20s COM sleep + 10s wake, elevated scans with SeDebugPrivilege):

Tool Author Checks Result
Hunt-Sleeping-Beacons thefLink Unbacked stack frames, non-exec pages in stack, stomped modules (CoW), APC dispatch on stack, timer callback enumeration, return address spoofing, abnormal intermodular calls βœ… Clean
pe-sieve hasherezade Implanted PEs, shellcode (pattern+stats), inline hooks, IAT hooks, patched headers, thread anomalies βœ… Clean
Moneta forrest-orr Private RWX memory, Copy-on-Write anomalies, modified code sections, unbacked executable regions βœ… Clean
Patriot joe-desimone Suspicious CONTEXT structures (Ekko/Foliage), unbacked executable regions, modified code (stomping), PE integrity βœ… Clean

* These results reflect the COMegon dispatch primitive in isolation. Where the calling code runs from (injected memory, stomped module, on-disk PE, etc.) is up to the user and will independently affect detection coverage by these tools and EDRs.

Architecture details

COMegon abuses the COM runtime's cross-apartment RPC dispatch to execute arbitrary function pointers through a fully legitimate system callstack. No hooks, no trampolines, no shellcode β€” just carefully crafted MIDL metadata that tells NdrStubCall2 to call our functions as if they were COM method implementations.

Layered Design

Layer 1: Sig Scanner        β€” Dynamic signature scanning of combase.dll .text section to resolve
                               internal functions (ModalLoop, CCliModalLoopCtor, PostCall, NoOpReturn0)
                               with version-gated fallback for known-good builds
Layer 2: Format Strings     β€” Build NDR proc format strings at runtime describing each dispatch
                               slot's parameter layout (param count, stack sizes, Oi2 header)
Layer 3: MIDL Tables        β€” Heap-allocated MIDL_SERVER_INFO, MIDL_STUB_DESC, RPC_SERVER_INTERFACE,
                               dispatch tables, format string offsets, CStdStubBuffer, stub vtable
Layer 4: COM Plumbing       β€” IPSFactoryBuffer, IRpcProxyBuffer, IClassFactory, ISynchronize β€”
                               fake COM objects that satisfy the runtime's QueryInterface/CreateStub/
                               CreateProxy calls during CoMarshalInterThreadInterfaceInStream
Layer 5: Caller STA Pump    β€” Caller thread does CoInitEx(APARTMENTTHREADED), registers our
                               PSFactory via CoRegisterClassObject, marshals IFiberDispatch, then
                               builds CCliModalLoop + switches directly into a ModalLoop pump fiber
Layer 6: PostCall Worker    β€” Short-lived MTA worker: GetBuffer β†’ fill RPCOLEMESSAGE β†’ PostCall for
                               each queued call, writes SOleTlsData PID and fake ISynchronize, exits
Layer 7: SendReceive Worker β€” (WIP) MTA worker: GetBuffer β†’ SendReceive β†’ FreeBuffer for synchronous
                               single-call proxy mode
Layer 8: Public API         β€” init(), queue(), pump() [working], invoke() [working], deinit()

Public API

// Initialize: initializes caller as STA, registers COM plumbing, marshals channel
var ctx = try comegon.init(.{ .max_slots = 256 });
defer ctx.deinit();

// Sleep mode: queue N calls, fire them all, block until done
// No return values β€” fire-and-forget. Last call is auto-appended SwitchToFiber(caller_fiber).
ctx.queue(fn_VirtualProtect, &.{ text_base, text_size, PAGE_RW, &old_prot });
ctx.queue(fn_SystemFunction032, &.{ &img_range, &key_range });
ctx.queue(fn_NtDelayExecution, &.{ 0, &delay });
ctx.queue(fn_SystemFunction032, &.{ &img_range, &key_range });
ctx.queue(fn_VirtualProtect, &.{ text_base, text_size, PAGE_RX, &old_prot });
ctx.pump();  // blocks until all 5 calls + SwitchToFiber(caller_fiber) complete

// Proxy mode (WIP): single synchronous call, captures HRESULT return value
// Dispatches via SendReceive (synchronous) + PostCall(SwitchToFiber) for cleanup
const status = ctx.invoke(fn_NtAllocateVirtualMemory, &.{ process, &base, 0, &size, MEM_COMMIT, PAGE_RW });
Function Mode Transport Returns Status
queue() + pump() Sleep PostCall (async) void βœ… Working
invoke() Proxy SendReceive (sync) HRESULT (u32) βœ… Working

The Dispatch Chain (Sleep Mode)

               Short-lived MTA Worker                   Caller Thread / Pump Fiber
               ───────────────────────                   ───────────────────────────

               GetBuffer(channel, &msg, IID)
               msg.iMethod = RESERVED_METHODS + slot
               PostCall(channel, &msg, &sync, pid)  ──►  WM_USER+0x00 posted to caller STA window
                  ... repeat for each queued call ...
               worker exits before dispatch begins

                                                      Build CML + CreateFiber(ModalLoop, ...)
                                                             SwitchToFiber(pump_fiber)
                                                                   β”‚
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚ combase!ModalLoop                       β”‚
                                                      β”‚   └─► combase!BlockFn                   β”‚
                                                      β”‚         β”œβ”€ WaitForMultipleObjects       β”‚
                                                      β”‚         β”œβ”€ GetQueueStatus               β”‚
                                                      β”‚         β”œβ”€ PeekRPCAndDDEMessage         β”‚
                                                      β”‚         └─ DispatchMessageW             β”‚
                                                      β”‚              └─► ThreadWndProc          β”‚
                                                      β”‚                   └─► ThreadDisp        β”‚
                                                      β”‚                        └─► CIFLAI       β”‚
                                                      β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
                                                      β”‚                   β”‚CStdStub_Invβ”‚        β”‚
                                                      β”‚                   β”‚  └─NdrStub β”‚        β”‚
                                                      β”‚                   β”‚    └─YOUR  β”‚        β”‚
                                                      β”‚                   β”‚      API   β”‚        β”‚
                                                      β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Full callstack during dispatch (what ETW/debugger sees):

ntdll!NtDelayExecution           ← your function
rpcrt4!NdrStubCall2              ← NDR unmarshals params, calls dispatch_table[method]
combase!CStdStubBuffer_Invoke    ← stub vtable[5], routes to NdrStubCall2 via MIDL_SERVER_INFO
combase!SyncStubInvoke
combase!StubInvoke
combase!ComInvokeWithLockAndIPID ← CIFLAI: finds IPID entry, resolves stub, calls Invoke
combase!ThreadDispatch
combase!ThreadWndProc            ← STA COM hidden window message handler
user32!DispatchMessageW
combase!PeekRPCAndDDEMessage
combase!CCliModalLoop::BlockFn   ← pumps messages in a wait loop
combase!ModalLoop                ← top-level RPC pump (entry point of pump fiber)
ntdll!BaseFiberStart             ← fiber entry

Key Internal Structures

CStdStubBuffer (fake_stub, 0x48 bytes) β€” The structure CStdStubBuffer_Invoke receives as this:

+0x00  pvServerObject      β†’ fake_server_obj (whose [0] β†’ server_dispatch_array)
+0x08  vtbl_ptr            β†’ stub vtable (CStdStubBuffer_QI/AddRef/Connect/.../Invoke)
+0x10  ref_count           = 1
+0x18  pvServerObject2     β†’ fake_server_obj (refreshed before each pump cycle)
+0x40  pHeader             β†’ CInterfaceStubHeader

CInterfaceStubHeader (precedes stub vtable in memory):

+0x00  piid                β†’ IID_IFiberDispatch
+0x08  pServerInfo         β†’ MIDL_SERVER_INFO
+0x10  DispatchTableCount  = max_slots
+0x14  _pad                = 0
+0x18  pDispatchTable      = -1  ← CRITICAL: must be -1 for NDR path (0 = skip dispatch)

MIDL_SERVER_INFO β€” Tells NdrStubCall2 where to find everything:

+0x00  pStubDesc           β†’ MIDL_STUB_DESC (alloc/free funcs, format types)
+0x08  DispatchTable       β†’ [fn_ptr, fn_ptr, ...]  ← YOUR function pointers
+0x10  ProcString          β†’ all_fmt buffer (NDR format strings per method)
+0x18  FmtStringOffset     β†’ [offset, offset, ...]  ← per-method offset into ProcString
+0x20  ThunkTable          = null
+0x28  pTransferSyntax     β†’ NDR 2.0 syntax GUID {8A885D04-1CEB-11C9-...}

Fake CSyncClientCall (0x120 bytes, passed to ModalLoop as fiber param):

+0xC0   inner_obj ptr      β†’ fake COM object (all vtable slots β†’ combase!NoOpReturn0)
+0x108  pump_event handle   ← signaled to make BlockFn's WaitForMultipleObjectsEx return

CCliModalLoop (constructed via CCliModalLoopCtor(buf, 0, 0x04FF, 0, 1)): Written into the caller thread's TLS chain: SOleTlsData.pCAptCallCtrl._pTopCML β†’ cml_buf. BlockFn reads cml_buf[0x18] for return code (RPC_S_CALLPENDING = keep pumping) and cml_buf[0x108] for the pump event handle. fCoWaitCalled=1 ensures ModalLoop dispatches pre-queued messages.

NDR Format Strings

Each dispatch slot gets a dynamically-built NDR Oi2 proc format string that describes the method's parameter layout. NdrStubCall2 parses this to determine stack frame size and unmarshals parameters from the RPCOLEMESSAGE buffer into the call.

Format structure per method:

[0x32, 0x48]                           ← Oi2 header (handle_type, oi_flags)
[rpc_flags: u32]                       ← 0
[method_num: u16]                      ← RESERVED_METHODS + slot
[stack_size: u16]                      ← (n_params + 1) * 8
[0, 0, 0, 0, 0, 0]                    ← excount, padding
[n_params + 1, 0]                      ← param count (including return)
Per-param (6 bytes each):
  [FC_HYPER, 0x08/0x48, stack_offset: u16, 0, 0]  ← 8-byte param at offset
Return:
  [FC_LONG, 0x52, stack_offset: u16, 0, 0]         ← HRESULT return
[0x5B, 0x5C, 0x00]                    ← FC_END, FC_PAD, terminator

COM Channel Setup

  1. Caller thread initializes apartment-threaded COM, registers a fake IPSFactoryBuffer under a custom CLSID via CoRegisterClassObject, then marshals a fake IFiberDispatch object via CoMarshalInterThreadInterfaceInStream
  2. CoIncrementMTAUsage pins the process MTA alive without keeping a persistent worker thread resident
  3. A short-lived MTA helper thread calls CoGetInterfaceAndReleaseStream to unmarshal β€” COM calls our IPSFactoryBuffer::CreateProxy, connects the IRpcChannelBuffer (which we capture for later GetBuffer/PostCall/SendReceive calls), then the helper exits
  4. The IRpcChannelBuffer is now a live cross-apartment channel. GetBuffer allocates an RPCOLEMESSAGE, PostCall fires it as an async message to the caller STA's hidden COM window

Sleep Mode Pump Cycle

  1. Caller queue()s N calls β†’ stored as {method_slot, fn_ptr, n_params}
  2. pump() refreshes the COM channel, auto-appends SwitchToFiber(caller_fiber), and starts a short-lived MTA worker
  3. The worker posts all N calls via PostCall (each writes method index into RPCOLEMESSAGE.iMethod, sets wParam via SOleTlsData PID pattern, provides fake ISynchronize), then exits
  4. Caller builds CML directly, creates the pump fiber (CreateFiber(ModalLoop, &fake_client_call)), and switches to it
  5. ModalLoop β†’ BlockFn pumps the caller STA message queue, dispatching each WM_USER through ThreadWndProc β†’ ComInvokeWithLockAndIPID β†’ CStdStubBuffer_Invoke β†’ NdrStubCall2 β†’ dispatch_table[iMethod] β†’ your function
  6. Last queued call is SwitchToFiber(caller_fiber) β€” returns control to caller, which destroys the suspended pump fiber (DeleteFiber) and completes

Build

Requires Zig (tested with 0.14.x):

zig build-exe comegon.zig -target x86_64-windows -O ReleaseSmall --name comegon_test

All optimization modes are supported: Debug, ReleaseSmall, ReleaseSafe, and ReleaseFast are all verified stable (30/30 each).

CET-Enforced Build

Zig 0.14.x has no native --cetcompat flag. Use a two-step build to opt into CET shadow stack enforcement:

# Step 1: Compile to COFF object
zig build-obj comegon.zig -target x86_64-windows -O ReleaseSmall --name comegon

# Step 2: Link with /cetcompat (reuses Zig's bundled lld-link)
zig lld-link -lldmingw -ERRORLIMIT:0 -NOLOGO -MLLVM:-float-abi=hard \
    -STACK:16777216 -BASE:5368709120 -BUILD-ID:NO -MACHINE:X64 -BREPRO \
    -OUT:comegon_cet.exe -SUBSYSTEM:console,6.0 -NODEFAULTLIB \
    -ENTRY:wWinMainCRTStartup /cetcompat \
    comegon.obj compiler_rt.lib ntdll.lib kernel32.lib

Tip: Run a normal zig build-exe --verbose-link first to get the exact lib paths for compiler_rt.lib, ntdll.lib, and kernel32.lib from your local Zig cache.

The /cetcompat flag sets IMAGE_DLL_CHARACTERISTICS_EX_CET_COMPAT (0x1) in the PE debug directory, opting the binary into CET shadow stack enforcement on HVCI-enabled machines.

Requirements

  • Windows 10/11 x64
  • COM runtime (combase.dll, rpcrt4.dll) β€” present on all Windows installations

License

Private β€” not for redistribution.

About

πŸ₯±πŸ˜΄πŸ›ŒπŸ’€

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages