Skip to content

fix(opstack): server receives no mainnet blocks — V4 topic, yamux, identify, OP-spec max message size#801

Open
amunt0 wants to merge 4 commits into
a16z:masterfrom
Inco-fhevm:fix/opstack-v4-blocks-gossip-topic
Open

fix(opstack): server receives no mainnet blocks — V4 topic, yamux, identify, OP-spec max message size#801
amunt0 wants to merge 4 commits into
a16z:masterfrom
Inco-fhevm:fix/opstack-v4-blocks-gossip-topic

Conversation

@amunt0

@amunt0 amunt0 commented Jun 4, 2026

Copy link
Copy Markdown

Problem

The opstack consensus server (opstack/bin/server.rs) receives no blocks on any mainnet OP-stack chain: /latest serves null (or a stale head) indefinitely. Four independent defects, found by running the server against Base mainnet with libp2p_gossipsub=debug and fixing one variable at a time.

1. Subscribed to the wrong gossip topic (original scope of this PR)

BlockHandler subscribes only to the V3 blocks topic:

blocks_v3_topic: IdentTopic::new(format!("/optimism/{chain_id}/2/blocks")),

but op-node publishes each block to exactly one topic version, selected by hardfork (op-node p2p/gossip.go):

if p.cfg.IsIsthmus(timestamp) {
    return p.blocksV4.topic.Publish(ctx, out)   // V4 only — no dual-publish
} else if p.cfg.IsEcotone(timestamp) {
    return p.blocksV3.topic.Publish(ctx, out)
}

Post-Isthmus mainnets publish on V4 (/optimism/<chain_id>/3/blocks) only. Fix: subscribe V4 alongside V3 (the commitment envelope and signature scheme are identical on both).

2. mplex-only transport — modern op-nodes can't multiplex with us

The server's transport offers mplex only (libp2p 0.51 era). mplex has been deprecated and dropped across the go-libp2p ecosystem; current Base/OP mainnet peers speak yamux. Measured: 7+ minutes, dozens of dials, zero connections established, zero blocks. Fix: bump libp2p to 0.56 and use yamux (chore: update libp2p to 0.56.0 + clippy follow-up, cherry-picked from our integration branch).

3. No identify behaviour — peers close every connection within ~300 ms

With yamux in place connections establish, but go-libp2p based op-nodes close them ~100–300 ms later, remote-initiated, 100% of dials:

libp2p_swarm: Connection closed with error IO(... Error(Right(Closed))) ... total_peers=0

gossipsub's mesh stays 0 of 8 forever; blocks only leak in when a brand-new peer briefly delivers before hanging up — bursts of seconds separated by 20–40 min of silence. The missing piece is the identify protocol (/ipfs/id/1.0.0): kona composes ping + gossipsub + identify (+ sync req/resp); this server had only ping + gossipsub. Fix mirrors kona: identify::Config::new("", public_key).

4. 64 KiB default max_transmit_size — one big block silences a peer forever

With identify in place the mesh GRAFTs and blocks stream at full 2 s cadence — until the first Base block larger than 64 KiB (the libp2p gossipsub default) arrives:

libp2p_gossipsub::handler: Failed to read from inbound stream: Failed to encode/decode message

The peer's inbound gossip stream dies and never recovers, while the connection stays up — silent starvation. (The first V4 message we ever captured was 59,502 bytes — already brushing the limit.) Fix: OP spec gossip maximum, 10 MiB — same as op-node's MaxGossipSize and kona's MAX_GOSSIP_SIZE.

Verification (Base mainnet, A/B, one variable at a time)

variant result
V3-only (upstream master) 19 min, 80 gossip peers, 0 messages, /latest = null
+ V4 topic (libp2p 0.51/mplex) 7+ min, dozens of dials, 0 connections
+ libp2p 0.56/yamux connections establish, then 100% closed by remote at ~300 ms; ~14 blocks per 10 min
+ identify GRAFT lands, full 2 s cadence — until a >64 KiB block kills the stream
+ 10 MiB max (full fix set, as on this branch) 416/416 consecutive blocks over 15 min, zero gaps, /latest continuously advancing

The full fix set has been running in production since 2026-06-04 — 6 k8s instances across 2 regions (initially also 3 bare-metal hosts, since consolidated): every instance converges to full 2 s block cadence and holds.

Possibly related: the public base.operationsolarstorm.org / op-mainnet.operationsolarstorm.org consensus endpoints currently return HTTP 502 while base-sepolia.operationsolarstorm.org works — consistent with V3-only subscribers behind the mainnet feeds.

Base mainnet's sequencer publishes signed unsafe heads on the V4 topic
(/optimism/<chain_id>/3/blocks) since Isthmus; subscribers listening
only on the V3 topic (/2/blocks) receive nothing and the consensus
server never serves a head. Matches the symptom of the public
operationsolarstorm mainnet feeds (502 / no data) while base-sepolia
still works. Keep V3 subscription for chains that still publish on it;
the handler verifies the same sequencer-signed commitment envelope on
both.
amaury1093 and others added 3 commits June 4, 2026 23:35
…ge size

Two independent defects starved the consensus server of blocks (heads
went stale for 20-40 min windows, with only short bursts of delivery):

1. No libp2p identify behaviour. go-libp2p based op-nodes close
   connections ~300ms after establishment when /ipfs/id/1.0.0 is
   unsupported (observed: every dial to a Base mainnet peer ended with
   a remote-initiated close; gossipsub mesh stayed at 0 of 8). Add
   identify (empty protocol version, agent "helios"), mirroring kona.

2. Default gossipsub max_transmit_size (64 KiB). The first Base block
   larger than 64 KiB kills the peer's inbound gossipsub stream with
   "Failed to read from inbound stream: Failed to encode/decode
   message" and delivery from that peer never resumes. Set the OP spec
   gossip maximum (10 MiB), same as kona's MAX_GOSSIP_SIZE and
   op-node's MaxGossipSize.

A/B verified against an unpatched instance on Base mainnet: unpatched
received 14 blocks in 10 min (all conns dropped in ~300ms); with
identify connections persist, GRAFT lands, and blocks arrive at full
2s cadence until a >64 KiB block kills the stream; with both fixes
delivery is continuous.
@amunt0 amunt0 changed the title fix(opstack): subscribe to V4 blocks gossip topic (post-Isthmus mainnet chains) fix(opstack): server receives no mainnet blocks — V4 topic, yamux, identify, OP-spec max message size Jun 4, 2026
@socket-security

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedcargo/​libp2p@​0.51.4 ⏵ 0.56.083 -1710093100100
Updatedcargo/​libp2p-identity@​0.1.3 ⏵ 0.2.1310010093100100

View full report

@socket-security

Copy link
Copy Markdown

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
High CVE: Yamux vulnerable to remote Panic via malformed Data frame with SYN set and len = 262145

CVE: GHSA-vxx9-2994-q338 Yamux vulnerable to remote Panic via malformed Data frame with SYN set and len = 262145 (HIGH)

Affected versions: < 0.13.10

Patched version: 0.13.10

From: ?cargo/yamux@0.12.1

ℹ Read more on: This package | This alert | What is a CVE?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at support@socket.dev.

Suggestion: Remove or replace dependencies that include known high severity CVEs. Consumers can use dependency overrides or npm audit fix --force to remove vulnerable dependencies.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore cargo/yamux@0.12.1. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants