RFC: Dynamic Linking Support #205

zyma98 · 2026-01-05T19:17:41Z

zyma98
Jan 5, 2026
Maintainer

RFC: Dynamic Linking Support

Feature name: dynamic-linking
Start date: 2026-01-05

Summary

Enable the Pie engine to load inferlet libraries in WASM binary form and resolve dependencies at runtime. This feature aims to improve memory efficiency and inferlet spawning latency.

Motivation

Pie has started to support inferlets written in JavaScript and Python. Inferlets written in these interpreted languages are significantly larger because, at the current stage, the inferlets must contain the interpreter and language runtime. It not only causes the inferlet binary size to bloat up to multiple tens of MBs but also the inferlet launch time to increase up to a few seconds.

One possible solution to the problem is to implement dynamic linking, where we separate out the interpreter and language runtime into a library which is loaded ahead of time than the application binary. There will be only one instantiated copy of the library in the engine, so the memory usage as well as the storage useage will be efficient. Also, by splitting out the interpreter and language runtime, the application binary will be small, so that launch time will be significantly reduced.

Mechanism

On the client side, two new commands will be added to the pie-cli program:

The load command, which uploads a library to the Pie engine. The library is allowed to have interface dependencies to host provided functions or exported interfaces of already loaded libraries.
The purge command, which removes all loaded libraries from the Pie engine. This command is allowed to run only when the engine is quiescent, meaning that no application inferlet is running.

Upon a load command, the engine will receive a library inferlet. To make the exported interfaces from this library available to subsequently loaded libraries or launched applications, the engine will create a shim as host provided interfaces which forwards calls to the library.

The shim sounds complicated, but it is necessary because an inferlet, as a WASM component, cannot directly call into another WASM component. (Unless, these components are composed together into a single binary and becomes a single component, but then it will become static linking and defeat our purposes.) The two feasible operations are calling from an inferlet into host provided functions or calling from host to an inferlet exported functions.

Therefore, to glue dynamic libraries and applications together, the engine defines host provided functions that have exactly the same signature as the library exported functions, which is the aforementioned shim. As such, applications will call the host provided shim, transfering the control flow from the application WASM to the host, and then the host will call the library, transfering the control back to the WASM world.

The reason to have a purge command that removes all loaded libraries but not an unload command that removes a specified library is due to the limitation of wasmtime::Linker. The linker can only add definitions but cannot remove one, so removing one library at a time is not possible. The purge command will be supported by dropping the old Linker and Store and creating new ones, effectively clearing everything.

Potential Drawbacks

Apart from increased complexity in the engine, one major drawback of the dynamic linking approach is the need for host mediation when function calls cross WASM boundaries. We need to first implement dynamic linking to study its performance impact.

ingim · 2026-01-06T20:36:20Z

ingim
Jan 6, 2026
Maintainer

I feel this RFC reveals more fundamental issue in Pie. Lack of inferlet composability & polymorphism.

Having thought about the problem for a while, I propose a slightly different idea to approach this: Nested Inferlet Calls

Motivation
Since the WASM instance launch time is in the order of a few microseconds (assuming the binary is JITed and in memory), we have an opportunity to fundamentally change how inferlets interact.

I propose we implement nested inferlet calls. This allows inferlets to compose other inferlets directly, similar to library imports. With the recent addition of bakery (package manager) support, we can now efficiently manage these dependencies.

Design Sketch: Rust
Here is how this would look in the Rust runtime. Standard inferlets can be treated as imports, or pre-instantiated modules.

#[inferlet::main]
async fn main(mut args: Args) -> Result<()> {

    await import("std/text-completion").run(
        prompt="Hello, my name is ",
        max_tokens=20,
        drafter="std/cacheback@latest",  ## IMPORTANT: Inferlet Polymorphism via function composition. See below.
    )

    await import("std/python-3.12").run(
        code="...",
    )

    await import("std/javascript").run(
        code="...",
    )

}

// Standard inferlets are assumed always available, and therefore have an preinstantiztied instance.

use inferlet.std.text_completion as text_completion

await text_completion.run(
    prompt "Hello, my name is ",
    max_tokens=20,
    drafter="std/cacheback@latest",
)

Design Sketch: Python SDK
For Python, this pattern allows for "Inline Functions" where inferlets act as modular components. Since import is a reserved keyword in Python, we use a use() helper for dynamic inferlet execution.

# note that "std" dependencies can be omitted.
@inferlet(deps=["std/text-completion","std/cacheback@latest"])
async def my_agent(prompt: str, max_tokens: int) -> str:
    
    # numpy is prepackaged in the std/python-3.12 inferlet environment
    import numpy as np

    response = await use("std/text-completion").run( # since "import" is a reserved keyword in python,  "use" instead.
        prompt=prompt,
        max_tokens=max_tokens,
        drafter="std/cacheback@latest",
    )
    return response


# if the user wants to use custom packages, they must bring their own interpreter inferlet.
@inferlet(
    interpreter="ingim/my-custom-python-env",
    deps=[
       "ingim/my-custom-python-env",
    ]
)
async def my_agent_with_custom_env(prompt: str, max_tokens: int) -> str:
    import custom_package
    return response


# using inferlets is as simple as calling a function:

with PieClient("localhost:8080"):

    response = await my_agent(
        prompt="Hello, my name is ",
        max_tokens=20,
    )

Manifest Changes (Pie.toml)
To support this, Pie.toml (introduced along with bakery, see std/text-completion in the v2 branch) needs a dedicated [dependencies] section. This allows the runtime to pre-fetch and cache the necessary WASM modules.

Current:

[package]
name = "std/text-completion"
version = "0.1.0"
description = "Simple text completion inferlet"
repository = "https://github.com/pie-project/pie"

[engine]
min_version = "^1.0.0"
[interface]
inputs = [
    { name = "prompt", type = "string", description = "The user message to complete" },
    { name = "system", type = "string", optional = true, description = "System prompt to set assistant behavior" },
    { name = "max_tokens", type = "int", optional = true, description = "Maximum number of tokens to generate (default: 256)" },
    { name = "temperature", type = "float", optional = true, description = "Sampling temperature (default: 0.6)" },
    { name = "top_p", type = "float", optional = true, description = "Top-p nucleus sampling threshold (default: 0.95)" },
]
outputs = [
    { name = "completion", type = "string", description = "The generated text completion" }
]

Updated Proposal:

[package]
name = "std/text-completion"
version = "0.1.0"
description = "Simple text completion inferlet"
repository = "https://github.com/pie-project/pie"

[engine]
min_version = "^1.0.0"
[interface]
inputs = [
    { name = "prompt", type = "string", description = "The user message to complete" },
    { name = "system", type = "string", optional = true, description = "System prompt to set assistant behavior" },
    { name = "max_tokens", type = "int", optional = true, description = "Maximum number of tokens to generate (default: 256)" },
    { name = "temperature", type = "float", optional = true, description = "Sampling temperature (default: 0.6)" },
    { name = "top_p", type = "float", optional = true, description = "Top-p nucleus sampling threshold (default: 0.95)" },
]
outputs = [
    { name = "completion", type = "string", description = "The generated text completion" }
]

[dependencies]
"std/text-completion" = "latest"
"std/cacheback" = "latest"

Inferlet Interfaces & Polymorphism
This architecture requires us to treat inferlets structurally. Just as Go interfaces define behavior rather than identity, an "Inferlet Interface" is simply a set of input/output signatures.

Any inferlet that matches the signature can be swapped in. This is critical for higher-order functions, such as passing a "drafter" inferlet into a text completion inferlet.

[interface]

inputs = [
    { name = "prompt", type = "string", description = "The user message to complete" },
    { name = "drafter", type = "inferlet(context:string)->string", optional = true, description = "Drafter inferlet module" },

]
outputs = [
    { name = "completion", type = "string", description = "The generated text completion" }
]

@zyma98 what do you think? My proposal is far from complete. Your inputs are appreciated.

0 replies

ingim · 2026-01-06T20:51:33Z

ingim
Jan 6, 2026
Maintainer

I just realized I missed explanation on how this mechanism resolves "fat interpreter" problem.

So basically, the server launches "std/python-3.12" inferlet, with the python code as an input argument (the eval approach we discussed). This server contains the popular python libraries (e.g., numpy) included.

If the user wants to import a custom library, then they must build a new Python interpreter inferlet (with the compatibile input signature as the std/python-3.12), and register it.

0 replies

zyma98 · 2026-01-07T03:44:50Z

zyma98
Jan 7, 2026
Maintainer Author

Hey @ingim, thank you for suggesting the new approach. This is something I haven't thought about. Now I start to see a strong duality between inferlets and conventional programs regarding composition.

Language Specific Solution

It's the same in inferlet programming as in conventional programming. Composable parts are written in the same language and interact with function calls. One example is our current monolithic Rust inferlet library crate.

Language Agnostic Solution - Static Linking

In inferlet programming, this approach is to compile each composable unit into WASM. The interface between them is defined by WIT. One example is my experimental inferlib architecture and inferlet spawning with pie-cli --link. In conventional programming, the counterpart is static linking, where binary object files are compiled from potentially different languages and linked together.

Language Agnostic Solution - Dynamic Linking

This is my proposal in this RFC. Composable units are still compiled into WASM, but linking happens at runtime to allow applications to share a single library instance. In conventional programming, the counterpart is dynamic linking.

Language Agnostic Solution - Multiprocessing

This is your new proposal. Composable units are executable WASM. I feel like this is the multiprocessing approach in conventional programming.

I have a few concerns about this approach:

We'll need to stabilize the CLI interface for many std programs. (CLI in the general sense, i.e., the names, types, and invariants of the arguments that the main function expects.)
We might need to define a new CLI interface description language like WIT, which feels like reinventing a wheel.
The interactions between composable units have the highest overhead among these four approaches.
Although spawning one WASM instance might have acceptable overhead, in case of nested dependencies like app -> libA -> libB -> libC and if app calls libA's function in a loop, we might take a performance hit.
It's harder to make stateful dependencies work. In the first three approaches, the library can provide a struct or a WASM resource which keeps the state across library function/method invocations. It'll be more complex to provide statefulness with the multiprocessing approach.

With all these said, my suggestion is to not kill any of these approaches at the moment. In conventional programming, all these four approaches thrive. I'd recommend that we implement all four approaches and reveal their pros and cons with quantitative performance statistics. Approach 1 and 2 are already running. I have the proof of concept code for approach 3 running. Approach 4 sounds also not hard.

0 replies

ingim · 2026-01-07T04:43:23Z

ingim
Jan 7, 2026
Maintainer

@zyma98 Thank you for the comments! That makes a lot of sense.

After reflecting on your comment, I realized my motivations were actually:

making inferlets easily reusable via bakery - critical for building an inferlet ecosystem.
ensuring Python inline inferlets are performant - essential for broader adoption by Python users.

I am convinced that dynamic linking looks like the right mechanism to achieve this. Because it is actually fully compatible with the design sketch I proposed earlier.

For example, we could express this as an explicit link call:

#[inferlet::main]
async fn main(mut args: Args) -> Result<()> {

    // Dynamically link to the interface at runtime and execute its main entry point
    await link("std/text-completion").main(
        prompt="Hello, my name is ",
        max_tokens=20,
        drafter="std/cacheback@latest", 
    )

    await link("std/python-3.12").main(
        code="...",
    )
}

And the "std/text-completion"'s manifest (Pie.toml) can define the interface (ie main's signatures). So that they can be easily looked up via bakery inferlet info text-completion`.

[[interface]]
[main]
inputs = ...
outputs = ...

I am currently cautious about "library inferlets" (inferlets that expose multiple public functions other than main). While I am not against the idea, I believe we should be careful to keep make things simple & reusable.

Does this make sense to you?

0 replies

zyma98 · 2026-01-07T19:45:52Z

zyma98
Jan 7, 2026
Maintainer Author

Thank you for your comments @ingim! I gained some clarity about our proposed design. I'm summarizing them below to ensure that we are on the same page.

Application vs Library Inferlets

Mechanically, there is no difference at the interface level. Both application and library inferlets export interfaces. Application inferlets export the run() function. Library inferlets can export many functions or resources.

The proposed pie-cli load command will also work for application inferlets. Once loaded, the run() function will be available for other inferlets to call.

Load-time vs Runtime Symbol Resolution

My proposed approach will use load-time symbol resolution. The dependencies are recorded in the compiled WASM binary mirroring those imports and exports in the WIT files used during compilation. The engine verifies that the dependencies can be satisfied at load time, i.e., WASM instantiation time. The counterpart in conventional programming is linking against a dynamic library via the -l flag during compilation, and the resulting dependencies can be examined using the ldd tool on the compiled binary.

Your proposed approach will use runtime symbol resolution. The WASM binary does not record the dependencies. The counterpart in conventional programming is the POSIX dlopen and dlsym functions. There is a short code example in the dlopen man page if you are not very familiar with them.

I do have some concerns about using runtime symbol resolution plus exporting all functionalities through main.

Runtime symbol resolution definitely allows more flexibilities in applications. However, functions calls across WASM components will need to go through more indirections and suffer from larger overheads. For now, I don't have an estimate of how large the additional overhead may be.

Exporting all functionalities of an inferlet through the main function leads to some challenges. The main function's signature at WASM level looks like this: fn main(args: Vec<String>) -> Result, which is not descriptive. Moreover, argument parsing takes additional time.

The code may also become less ergonomic. For example, a drafter may need to keep some states (e.g., Cacheback's cache) across decoding iterations. If the main function is the only available interface, the drafter needs to pass back a cache table handle as a string during initialization like let handle = drafter.main(&["init"]), and the iteration loop has to pass the handle string like drafter.main(&["decode", &handle]), making the code less readable.

I totally agree that we should make the component interface simple and reusable. In my envisioned design, we will just use the WIT file as the canonical interface definition. As long as we keep each library to focus on a small purpose, I believe it's manageable.

0 replies

ingim · 2026-01-07T20:06:25Z

ingim
Jan 7, 2026
Maintainer

Would it be possible to (1) share some example inferlet code using the proposed approach both in Rust and inline Python, that calls into other inferlets (or inferlet libraries), and (2) imagine how it would fit into a package manager ecosystem? (eg. can pie load be replaced by a [dependencies] section in the manifest?, should WIT be included as a metadata in bakery registry, or it is generated on the fly from Pie.toml manigest?)

As long as the end user experience is pleasant, I’m happy with your approach.

0 replies

zyma98 · 2026-01-07T20:44:57Z

zyma98
Jan 7, 2026
Maintainer Author

I'm not familiar with how currently the inline Python inferlet works. For Rust, it will look very similar to how we currently program. Here is the text completion application written with the inferlet libraries. The ChatFormatter, Context, etc., types are provided as resources from the inferlet libraries. Function and method calls just transparently cross to another WASM component.

We can definitely automate dependency loading with the package management system. We can specify the dependency library in a [dependencies] section of an inferlet. When we pie-cli run an application, pie-cli will perform the load under the hood if the libraries haven't already been loaded.

I prefer including the WIT as part of the metadata in the bakery registry. It will make it easier to verify that the specified dependencies have the correct WIT interfaces matched up.

0 replies

ingim · 2026-01-08T08:59:12Z

ingim
Jan 8, 2026
Maintainer

If we go with the load-time resolution approach, it seems like this effectively moves us to the WASM component model standard. In this picture, Pie becomes just a specialized WASM component runtime, and "dynamic linking" is essentially solving the composition and "big binary" problem.

I agree this is the good technical direction, piggybacking on the existing tech is usualy better than reinventing wheels.

However, I am worried about the user expereince.

Most of our users (AI engineers, Python devs) won't want to learn WIT or manage wit-bindgen outputs.
For example, the current workflow in your application is intimidating for our target audience . If we naively adopt the standard model, we force users to:

Learn WIT: Understand Interface Types to make inferlets reusable.
Manage Bindings: Deal with verbose wit-bindgen outputs (multiple .rs or .py files) just to import a library.
Manage Dependencies: Download WIT files for client-side compilation.

Since Pie is an LLM serving system, we should balance flexibility with simplicity.
I see two ways to handle this. I'd love your take on which trade-off we should accept:

Option 1: Hide the Component Model with a "magic tool"
We use standard WASM components under the hood, but we hide all the complexity behind bakery and pie CLI. Users never see a .wit file; Pie.toml remains the single source of truth.

bakery build would have to do the heavy lifting:

Auto-generate the WIT World based on Pie.toml dependencies.
Auto-generate exports based on the [interface] table.

It keeps the manifest simple:

[package]
name = "my-agent"

# Auto-generates the WIT export for the main entry point
[interface]
inputs = [
    { name = "prompt", type = "string" },
    { name = "temperature", type = "float", optional = true }
]
outputs = [
    { name = "completion", type = "string" }
]

# used to generate WIT imports automatically
[dependencies]
"zyma98/my-inferlet" = "0.1.0"

Trade-off: We get performance and standards compliance, but the tooling (codegen) becomes much more complex to build.

Option 2: Runtime Symbol Resolution
We skip the component model for user code and use plain WASM modules with a mechanism closer to dlopen/dlsym on the server. Clean & simple.

Trade-off: Dead simple for users and simple tooling, but we can lose performance, type safety, and the ability to easily pass complex data structures.

What do you think?

0 replies

zyma98 · 2026-01-08T17:16:20Z

zyma98
Jan 8, 2026
Maintainer Author

Thanks for the clarification! I personally prefer Option 1. If we later figure out that only a subset of the Wasm component interface model is usually used in Pie, we can leverage it to simplify our in-house interface definition and how bakery handles them.

1 reply

ingim Jan 8, 2026
Maintainer

Sounds good. Let's go with the option 1.

Yeah, I think for now, we can have a .wit interface associated with the manifest, if the interlet exports anything else than the main.

ingim · 2026-01-08T17:58:30Z

ingim
Jan 8, 2026
Maintainer

since this will shape the future of the inferlet ecosystem, I think we need to prioritize its development.

I see three remaining tasks:

(1). redesign "standard inferlets" (inferlets in the "std" namespace) and the library inferlets that glue them together based on the Component Model.

Some will export a main (eg., beam-search, constrained decoding), while others will be pure libraries (e.g., context, cacheback drafter).
we can borrow some designg from your inferlib API design.

(2). update bakery

Incorporate .wit interfaces.
The build command should pull all dependent .wit files and codegen the world and bindings.
The first prototype can just leverage static linking.

(3). implement dynamic lnking

We could do (1) together.
I can take a stab at 2, while you are working on 3.

1 reply

zyma98 Jan 8, 2026
Maintainer Author

Sounds fantastic! I'll form a dev plan once our v2 backend landed in main. I'll play with v2 code this afternoon.

zyma98 · 2026-01-28T02:16:39Z

zyma98
Jan 28, 2026
Maintainer Author

Summarizing our discussion for the next steps to incorporate dynamic linking with Bakery and forming a dev plan. Please let me know how it looks. @ingim

Packet Search Path

We'll make bakery the canonical source for downloading both inferlets and libraries. I'll use the term package to refer to either inferlet or library. We'll refactor the client upload_program function (maybe call it upload_package) so that it accepts a directory as arguments. The directory should contain the Wasm binary and its manifest toml file. The requirements for uploading the directory to the registry and to the engine are the same.

On the engine side, we'll implement a two level search. The engine will first search through the packages uploaded to the engine. If the specified package is not found, it'll proceed to searching through the registry.

Dependency Specification

Each package's dependency will be specified in its manifest file. During inferlet instantiation or library load time, the engine will recursively load the dependencies if they haven't already been loaded. Since we'll refactor the engine to accept package upload, the engine will always have the dependency information either from the uploaded package or from registry, therefore the client need not specify the dependency again when instantiating an inferlet.

Library Ergonomics

We'll provide smooth experience for inferlet developers who want to use libraries from the registry. The Wasm binary uploaded to the registry already contains the interface information, which can be examined with tools like wasm-tools component wit example.wasm. The registry will automatically generate language-specific bindings from the WIT interface description. Inferlet developers will pull the language specific bindings from the registry. For example, Rust developers can pull the Rust bindings easily with one line in the Rust toml manifest:

[package]
name = "my-inferlet"
version = "0.1.0"
edition = "2024"

[dependencies]
inferlet-std = { version = "1.0", registry = "pie-registry" }

Dev Plan

(Done) Implement items described in Packet Search Path.
(Done) Implement items described in Dependency Specification.
(Done) Port over the dynamic linking implementation from PR Support inferlet libraries and dynamic linking #245.
(Done) Add some libraries to the registry and confirm the new dynamic linking mechanism works end-to-end.
(Ongoing) Add language-specific bindings to existing libraries.
(Ongoing) Automate language-specific bindings generation from WIT.

PRs

Related Discussions

New manifest format: RFC: Inferlet Package Manager #98 (comment)
New Wasm package naming and interface architecture: RFC: Inferlet Package Manager #98 (comment)

0 replies

ingim · 2026-01-28T03:09:48Z

ingim
Jan 28, 2026
Maintainer

This design is a bit scary to me because it effectively requires us to become package registry maintainers. Yet it sounds fantastic for the developer experience. Can we do a quick sanity check to see if hosting our own Crates/NPM/PyPI endpoints is not too complicated?

3 replies

ingim Jan 28, 2026
Maintainer

At the same time, I am amazed that we can even consider this as a design choice! AI has really expanded what we can achieve.

zyma98 Jan 28, 2026
Maintainer Author

Let me try to create a custom Cargo registry and see how much effort it requires.

zyma98 Jan 28, 2026
Maintainer Author

It turned out to be pretty simple to host a custom registry (at least verified for Rust). Below is the proof of concept demo:

`.cargo/config.toml`

[registries]
my-registry = { index = "sparse+https://zyma98.github.io/custom-rust-registry/" }

`src/main.rs`

fn main() {
    let message = hello_world::hello();
    println!("{message}");
}

`Cargo.toml`

[package]
name = "hello-world-app"
version = "0.1.0"
edition = "2024"

[dependencies]
hello-world = { version = "0.1.1", registry = "my-registry" }

Uh oh!

RFC: Dynamic Linking Support #205

Uh oh!

Uh oh!

zyma98 Jan 5, 2026 Maintainer

RFC: Dynamic Linking Support

Summary

Motivation

Mechanism

Potential Drawbacks

Replies: 12 comments · 5 replies

Uh oh!

ingim Jan 6, 2026 Maintainer

Uh oh!

ingim Jan 6, 2026 Maintainer

Uh oh!

Uh oh!

zyma98 Jan 7, 2026 Maintainer Author

Uh oh!

ingim Jan 7, 2026 Maintainer

Uh oh!

zyma98 Jan 7, 2026 Maintainer Author

Application vs Library Inferlets

Load-time vs Runtime Symbol Resolution

Uh oh!

Uh oh!

ingim Jan 7, 2026 Maintainer

Uh oh!

Uh oh!

zyma98 Jan 7, 2026 Maintainer Author

Uh oh!

Uh oh!

ingim Jan 8, 2026 Maintainer

Uh oh!

zyma98 Jan 8, 2026 Maintainer Author

Uh oh!

ingim Jan 8, 2026 Maintainer

Uh oh!

ingim Jan 8, 2026 Maintainer

Uh oh!

zyma98 Jan 8, 2026 Maintainer Author

Uh oh!

Uh oh!

zyma98 Jan 28, 2026 Maintainer Author

Packet Search Path

Dependency Specification

Library Ergonomics

Dev Plan

PRs

Related Discussions

Uh oh!

ingim Jan 28, 2026 Maintainer

Uh oh!

ingim Jan 28, 2026 Maintainer

Uh oh!

zyma98 Jan 28, 2026 Maintainer Author

Uh oh!

Uh oh!

zyma98 Jan 28, 2026 Maintainer Author

.cargo/config.toml

src/main.rs

Cargo.toml

zyma98
Jan 5, 2026
Maintainer

Replies: 12 comments 5 replies

ingim
Jan 6, 2026
Maintainer

ingim
Jan 6, 2026
Maintainer

zyma98
Jan 7, 2026
Maintainer Author

ingim
Jan 7, 2026
Maintainer

zyma98
Jan 7, 2026
Maintainer Author

ingim
Jan 7, 2026
Maintainer

zyma98
Jan 7, 2026
Maintainer Author

ingim
Jan 8, 2026
Maintainer

zyma98
Jan 8, 2026
Maintainer Author

ingim Jan 8, 2026
Maintainer

ingim
Jan 8, 2026
Maintainer

zyma98 Jan 8, 2026
Maintainer Author

zyma98
Jan 28, 2026
Maintainer Author

ingim
Jan 28, 2026
Maintainer

ingim Jan 28, 2026
Maintainer

zyma98 Jan 28, 2026
Maintainer Author

zyma98 Jan 28, 2026
Maintainer Author

`.cargo/config.toml`

`src/main.rs`

`Cargo.toml`