A Machine-Learning Guided Yul Pass Orchestrator and Bytecode Superoptimizer for the Solidity Compiler
Every smart contract deployed on Ethereum has a financial cost attached to every operation it executes. The standard Solidity compiler (solc) attempts to reduce these gas costs by running a fixed, hardcoded sequence of transformation passes on its intermediate representation, Yul.
NeuralYul replaces this static heuristic with a reinforcement learning architecture. By embedding the compiler's intermediate representation into a continuous vector space and orchestrating passes dynamically, NeuralYul minimizes execution gas or zk-proof constraints with mathematical certainty.
NeuralYul operates across two distinct optimization boundaries to bridge the semantic gap between high-level logic and low-level machine constraints.
An Ahead-of-Time (AOT) compiler orchestrator embedded directly inside a modified solc binary.
- State Representation: Converts the Yul AST into a Heterogeneous Graph Attention Network (GAT) embedding combined with a Transformer encoder layer to capture both local data-flow and global state dependencies.
- Policy Orchestration: A Proximal Policy Optimization (PPO) agent predicts the optimal sequence of the compiler's 24 built-in transformation passes (e.g.,
DeadCodeEliminator,FullInliner) based on the specific contract topology. - Pre-training: The graph encoder is pre-trained on the YulCode dataset (350,000 contracts) for gas regression and pass applicability classification.
A post-compilation reinforcement learning loop that directly mutates raw EVM opcodes to squeeze out micro-gas inefficiencies.
-
Concolic Execution: Resolves dynamic
JUMPandJUMPIdestinations using abstract interpretation and concrete fuzzing traces to reconstruct a deterministic Control Flow Graph. - Constrained MDP (CMDP): The RL environment is formulated using Lagrangian duality to enforce mathematical safety. The agent maximizes gas savings while a strict cost function penalizes semantic divergence.
-
Objective Function:
$\max_\theta \min_{\lambda \geq 0} \mathcal{L}(\theta, \lambda) = J_R(\pi_\theta) - \lambda \cdot (J_C(\pi_\theta) - \epsilon)$ - Hardware Enforcement: The EVM's strict 16-slot stack limit is physically enforced during RL exploration via dynamic action masking.
No optimized contract leaves the system without passing strict verification.
- Fast Inner Loop: Differential fuzzing validates output, storage state, and gas reduction across thousands of generated calldata inputs.
- Formal Verification (Z3): The final hyper-optimized bytecode is mathematically proven against the original code using the Z3 Theorem Prover.
- UIF Abstraction:
KECCAK256state space explosions are bypassed using Uninterpreted Functions (UIFs), proving semantic equivalence via mathematical congruence.
- Zero-Copy Rust Execution Environment: Evaluates the RL reward function by wrapping
revm(the fastest Rust EVM) in a shared memory ring buffer. This bypasses Python-Rust FFI serialization bottlenecks to achieve ~100,000 state transitions per second. - Dual-Target Cost Functions: Compiles optimally for either Ethereum Mainnet (penalizing
SSTORE) or zkEVM rollups (penalizingKECCAK256and optimizing for PLONK constraint reduction). - Multi-Agent RL (MARL): For large DeFi protocols, NeuralYul assigns independent agents to separate Yul functions under a single coordinator, utilizing leave-one-out attribution for accurate credit assignment.
NeuralYul hooks directly into standard development frameworks via a lightweight Rust wrapper, requiring no new security assumptions or workflow changes.
Fast Development Loop (Seconds) Use the embedded C++ Yul-MLGO model for rapid, phase-ordered structural optimization.
forge build --ml-optimizeThis repo contains the full infrastructure for the system. The neural networks
and the YulCode dataset are intentionally not included — see
docs/INFRASTRUCTURE.md for the exact spec→file map and the
model boundary in src/neuralyul/models/.
src/neuralyul/
config.py # all hyperparameters + 36-d feature layout (single source of truth)
pdg/ # Yul Program Dependence Graph parser (AST + CFG + DFG, labelled edges)
data/ # NFM/EP/SE augmentations + PyG two-view contrastive dataset engine
models/ # typed interfaces ONLY — the NNs plug in here (no weights shipped)
env/ # 24 Yul passes, solc driver, Gymnasium env (reward = gas saved)
reward/ # zero-copy gas bridge (mock + PyO3) + shared-memory ring buffer client
correctness/ # differential fuzzer + Z3 equivalence (KECCAK as an uninterpreted fn)
daemon.py # FastAPI superoptimizer/DevEx API
executor/ # Rust: SPSC ring buffer + GasBackend (Mock / revm) + PyO3 + worker bin
solc-plugin/ # C++: FeatureExtractor + YulMLRunner (ONNX) + Suite.cpp hook
orchestration/ # Rust Foundry wrapper (neuralyul-forge)
Everything below runs with only networkx + solc (the gas evaluator falls back
to a dependency-free deterministic mock; the real EVM is one Cargo feature away).
nix develop # toolchain: python, rust, maturin, solc, z3
pip install -e '.[dev]' # base install (+ pytest)
# Inspect the pipeline without any model:
neuralyul parse src/L1-encoder/sample.yul # PDG topology + edge-type stats
neuralyul passes # the 24-action Yul pass space
neuralyul gas src/L1-encoder/sample.yul dhfoDgvulfn # compile a sequence, measure gas
neuralyul fuzz src/L1-encoder/sample.yul # differential-fuzz baseline vs optimised
pytest # optional-dep tests self-skipcargo build --release --manifest-path executor/Cargo.toml # lib + worker (mock backend)
maturin develop -m executor/Cargo.toml # Python ext (mock)
maturin develop -m executor/Cargo.toml --features revm # Python ext (REAL EVM)Run the multi-process zero-copy loop: Python RingBufferClient.create(...) then
neuralyul-worker /tmp/neuralyul_gas_ring attaches to the same mmap segment.
pip install -e '.[graph]' # torch + torch_geometric: augmentations & dataset engine
pip install -e '.[rl]' # gymnasium: the RL environment
pip install -e '.[verify]' # z3-solver: the formal equivalence gate
pip install -e '.[daemon]' # fastapi/uvicorn: the superoptimizer daemonWhat's excluded: the GIN/PPO/PRM network bodies, their training loops, and the 350k-contract dataset. The infra is wired to typed seams so dropping a trained model in requires no changes to the parser, env, reward bridge, or correctness gate.