Skip to content

mdgspace/Neural-Yul

Repository files navigation

NeuralYul

A Machine-Learning Guided Yul Pass Orchestrator and Bytecode Superoptimizer for the Solidity Compiler

Every smart contract deployed on Ethereum has a financial cost attached to every operation it executes. The standard Solidity compiler (solc) attempts to reduce these gas costs by running a fixed, hardcoded sequence of transformation passes on its intermediate representation, Yul.

NeuralYul replaces this static heuristic with a reinforcement learning architecture. By embedding the compiler's intermediate representation into a continuous vector space and orchestrating passes dynamically, NeuralYul minimizes execution gas or zk-proof constraints with mathematical certainty.


🧠 System Architecture

NeuralYul operates across two distinct optimization boundaries to bridge the semantic gap between high-level logic and low-level machine constraints.

1. Yul-MLGO (The Strategist)

An Ahead-of-Time (AOT) compiler orchestrator embedded directly inside a modified solc binary.

  • State Representation: Converts the Yul AST into a Heterogeneous Graph Attention Network (GAT) embedding combined with a Transformer encoder layer to capture both local data-flow and global state dependencies.
  • Policy Orchestration: A Proximal Policy Optimization (PPO) agent predicts the optimal sequence of the compiler's 24 built-in transformation passes (e.g., DeadCodeEliminator, FullInliner) based on the specific contract topology.
  • Pre-training: The graph encoder is pre-trained on the YulCode dataset (350,000 contracts) for gas regression and pass applicability classification.

2. Bytecode Superoptimizer (The Tactician)

A post-compilation reinforcement learning loop that directly mutates raw EVM opcodes to squeeze out micro-gas inefficiencies.

  • Concolic Execution: Resolves dynamic JUMP and JUMPI destinations using abstract interpretation and concrete fuzzing traces to reconstruct a deterministic Control Flow Graph.
  • Constrained MDP (CMDP): The RL environment is formulated using Lagrangian duality to enforce mathematical safety. The agent maximizes gas savings while a strict cost function penalizes semantic divergence.
  • Objective Function: $\max_\theta \min_{\lambda \geq 0} \mathcal{L}(\theta, \lambda) = J_R(\pi_\theta) - \lambda \cdot (J_C(\pi_\theta) - \epsilon)$
  • Hardware Enforcement: The EVM's strict 16-slot stack limit is physically enforced during RL exploration via dynamic action masking.

3. The Correctness Gate

No optimized contract leaves the system without passing strict verification.

  • Fast Inner Loop: Differential fuzzing validates output, storage state, and gas reduction across thousands of generated calldata inputs.
  • Formal Verification (Z3): The final hyper-optimized bytecode is mathematically proven against the original code using the Z3 Theorem Prover.
  • UIF Abstraction: KECCAK256 state space explosions are bypassed using Uninterpreted Functions (UIFs), proving semantic equivalence via mathematical congruence.

⚡ Core Features

  • Zero-Copy Rust Execution Environment: Evaluates the RL reward function by wrapping revm (the fastest Rust EVM) in a shared memory ring buffer. This bypasses Python-Rust FFI serialization bottlenecks to achieve ~100,000 state transitions per second.
  • Dual-Target Cost Functions: Compiles optimally for either Ethereum Mainnet (penalizing SSTORE) or zkEVM rollups (penalizing KECCAK256 and optimizing for PLONK constraint reduction).
  • Multi-Agent RL (MARL): For large DeFi protocols, NeuralYul assigns independent agents to separate Yul functions under a single coordinator, utilizing leave-one-out attribution for accurate credit assignment.

🛠 Developer Experience (DevEx) & Usage

NeuralYul hooks directly into standard development frameworks via a lightweight Rust wrapper, requiring no new security assumptions or workflow changes.

Fast Development Loop (Seconds) Use the embedded C++ Yul-MLGO model for rapid, phase-ordered structural optimization.

forge build --ml-optimize

📦 Repository Layout (Infrastructure)

This repo contains the full infrastructure for the system. The neural networks and the YulCode dataset are intentionally not included — see docs/INFRASTRUCTURE.md for the exact spec→file map and the model boundary in src/neuralyul/models/.

src/neuralyul/
  config.py            # all hyperparameters + 36-d feature layout (single source of truth)
  pdg/                 # Yul Program Dependence Graph parser (AST + CFG + DFG, labelled edges)
  data/                # NFM/EP/SE augmentations + PyG two-view contrastive dataset engine
  models/              # typed interfaces ONLY — the NNs plug in here (no weights shipped)
  env/                 # 24 Yul passes, solc driver, Gymnasium env (reward = gas saved)
  reward/              # zero-copy gas bridge (mock + PyO3) + shared-memory ring buffer client
  correctness/         # differential fuzzer + Z3 equivalence (KECCAK as an uninterpreted fn)
  daemon.py            # FastAPI superoptimizer/DevEx API
executor/              # Rust: SPSC ring buffer + GasBackend (Mock / revm) + PyO3 + worker bin
solc-plugin/           # C++: FeatureExtractor + YulMLRunner (ONNX) + Suite.cpp hook
orchestration/         # Rust Foundry wrapper (neuralyul-forge)

🚀 Quickstart

Everything below runs with only networkx + solc (the gas evaluator falls back to a dependency-free deterministic mock; the real EVM is one Cargo feature away).

nix develop                      # toolchain: python, rust, maturin, solc, z3
pip install -e '.[dev]'          # base install (+ pytest)

# Inspect the pipeline without any model:
neuralyul parse  src/L1-encoder/sample.yul     # PDG topology + edge-type stats
neuralyul passes                               # the 24-action Yul pass space
neuralyul gas    src/L1-encoder/sample.yul dhfoDgvulfn   # compile a sequence, measure gas
neuralyul fuzz   src/L1-encoder/sample.yul     # differential-fuzz baseline vs optimised

pytest                                         # optional-dep tests self-skip

Build the zero-copy Rust executor

cargo build --release --manifest-path executor/Cargo.toml      # lib + worker (mock backend)
maturin develop -m executor/Cargo.toml                         # Python ext (mock)
maturin develop -m executor/Cargo.toml --features revm         # Python ext (REAL EVM)

Run the multi-process zero-copy loop: Python RingBufferClient.create(...) then neuralyul-worker /tmp/neuralyul_gas_ring attaches to the same mmap segment.

Optional capabilities

pip install -e '.[graph]'    # torch + torch_geometric: augmentations & dataset engine
pip install -e '.[rl]'       # gymnasium: the RL environment
pip install -e '.[verify]'   # z3-solver: the formal equivalence gate
pip install -e '.[daemon]'   # fastapi/uvicorn: the superoptimizer daemon

What's excluded: the GIN/PPO/PRM network bodies, their training loops, and the 350k-contract dataset. The infra is wired to typed seams so dropping a trained model in requires no changes to the parser, env, reward bridge, or correctness gate.

About

No description, website, or topics provided.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors