Skip to content

solana: multi-tenant deployment#875

Open
kcsongor wants to merge 12 commits into
mainfrom
cs/solana-multi-host
Open

solana: multi-tenant deployment#875
kcsongor wants to merge 12 commits into
mainfrom
cs/solana-multi-host

Conversation

@kcsongor

@kcsongor kcsongor commented May 7, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@kcsongor kcsongor force-pushed the cs/solana-multi-host branch 6 times, most recently from 001c6a6 to 73f9855 Compare May 11, 2026 13:29
@kcsongor kcsongor marked this pull request as ready for review May 11, 2026 13:30
@johnsaigle

Copy link
Copy Markdown
Collaborator

@claude

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown

Claude finished @johnsaigle's task in 3m 35s —— View job


Review: PR #875 — Solana Multi-Tenant Deployment

  • Read repository guidelines
  • Gather context on key changes
  • Review Solana program changes
  • Review CLI changes
  • Review SDK/TS changes
  • Review test changes
  • Post review summary

Overview

This is a well-architected PR that transforms the Solana NTT program from a singleton-per-program model (v3) into a multi-instance/multi-tenant architecture (v4). Each program can now host multiple independent NTT deployments, where each deployment is identified by a caller-provided Config keypair rather than the program ID.

Key architectural changes:

  • Config account changes from a global PDA (seeds = [b"config"]) to a caller-provided keypair
  • All PDAs are now scoped by config.key() as a seed prefix for cross-instance isolation
  • On-the-wire manager identity changes from program ID → Config pubkey
  • Ownership transfer is decoupled from BPF loader upgrade authority (pure data mutations)
  • New OutboxItem.manager field binds outbox items to their source instance

The PR is ~15k lines with changes across 89 files. Here are my findings:


Solana Program — Positive Findings

Cross-instance isolation is thorough. Every critical PDA (outbox rate limit, inbox rate limit, peers, registered transceivers, token authority, session authority, emitter, inbox items, LUT) correctly incorporates config.key().as_ref() in its seeds. I verified this across all instruction files.

OutboxItem binding is correct. The new outbox_item.manager == config.key() constraint in both mark_outbox_item_as_released.rs:23 and release_outbound.rs:22 prevents transceivers from one instance from releasing outbox items belonging to another — this is critical for multi-instance security.

Redeem properly validates the recipient manager identity against config.key() at redeem.rs:43, preventing cross-instance message replay.

Ownership transfer simplification is clean. Removing the BPF loader CPI (~90 lines) and replacing it with a pure data mutation in transfer_ownership.rs is a significant reduction in complexity and attack surface.


Issues & Suggestions

1. set-mint-authority.ts — Missing instance awareness on undeployed v4 path (Medium)

cli/src/commands/set-mint-authority.ts:152:

const tokenAuthority = NTT.pdas(managerKey).tokenAuthority();

When solanaNtt is undefined (undeployed program path, major === -1), and the user intends to deploy a v4 program, NTT.pdas(managerKey) computes the v3 token authority PDA (no instance scope). This would be incorrect for v4 deployments — the authority should be NTT.pdas(managerKey, instancePubkey).tokenAuthority().

The deployed path correctly uses solanaNtt.pdas (which has the instance built in), but the undeployed path doesn't accept an --instance flag. Consider either:

  • Adding --instance to set-mint-authority for undeployed v4 programs, or
  • Documenting that this command's --manager/--token mode is v3-only

Fix this →

2. Silent error swallowing in addSolanaInstance (Low-Medium)

cli/src/solana/deploy.ts:677-688:

try {
  await signSendWait(ch, initTxs, signer.signer);
} catch (e: any) {
  console.error(e.logs);
}

Both the initialize and transceiver registration steps catch errors and only log e.logs. If e is not a Solana SendTransactionError, .logs will be undefined, and the function returns "success" with a partially initialized instance. The same pattern exists in the legacy deploySvm (lines 517-528) — but introducing it in new code is an opportunity to do better.

Consider at minimum logging the full error message:

} catch (e: any) {
  console.error("Initialize failed:", e.logs ?? e.message ?? e);
}

Or better: distinguish between recoverable failures (e.g., LUT initialization when wormhole bridge isn't loaded) and non-recoverable ones (initialization itself failing).

Fix this →

3. Instance keypair written to relative path (Low)

cli/src/solana/deploy.ts:573:

const generatedPath = `${ch.chain}-instance.json`;

When no --instance-key is provided, the generated keypair is written to the current working directory with a name like Solana-instance.json. This is fragile if the CLI is invoked from different directories. Consider writing to the deployment config directory instead, or at least using an absolute path in the log message.

4. OutboxRateLimit comment is now inaccurate (Nit)

solana/programs/example-native-token-transfers/src/queue/outbox.rs:54-55:

/// Global rate limit for all outbound transfers to all chains.
/// NOTE: only one of this account can exist, so we don't need to check the PDA.

In v4, this is no longer true — each instance has its own outbox rate limit (scoped by config.key()). Multiple such accounts can exist. The comment should be updated.

Fix this →

5. Consider documenting the migration path more explicitly (Suggestion)

The upgradeBarriers.ts correctly blocks v3→v4 in-place upgrades with a helpful message. However, the operational steps for migrating an existing v3 deployment to v4 (fresh deploy, re-peer, migrate liquidity) might warrant a brief note in the barrier's reason string or a linked doc.


Tests Assessment

The new test coverage is solid:

  • multi_instance.test.ts — Tests two independent instances under the same program, verifies PDA isolation, state isolation (pausing one doesn't affect the other), and cross-instance VAA replay protection. This is the most important new test.
  • upgrade_authority_decoupling.test.ts — Verifies that ownership transfer no longer modifies the BPF loader's upgrade authority.
  • e2e-solana.test.ts — End-to-end CLI test covering full deployment lifecycle, --instance-of flow, v3→v4 upgrade rejection, and solana token-authority --instance command.

Minor test gaps:

  • No explicit test for keypair mismatch at initialization (SDK validation exists but isn't tested)
  • No test verifying per-instance rate limit independence under actual transfer load

Architecture Summary

Aspect v3 v4
Instances per program 1 Multiple
Instance identity Program ID Config keypair
PDA scoping Global Per-instance (config.key())
Ownership Tied to BPF upgrade authority Independent
Manager identity (on-wire) Program ID bytes Config pubkey bytes
Ownership transfer BPF loader CPI (~90 LOC) Pure data mutation (~3 LOC)

Overall this is a well-executed architectural change with strong cross-instance isolation. The issues noted above are relatively minor — the most actionable one is the set-mint-authority instance awareness gap.


@johnsaigle johnsaigle left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rebase

kcsongor added 9 commits June 3, 2026 14:47
- bind OutboxItem to manager instance
- decouple BPF upgrade authority from Config owner
Multi-tenant Solana NTT (>= v4) PDAs are scoped by the per-deployment
Instance pubkey, so every CLI surface that reads or writes a manager
needs to know it:

- ChainConfig grows an optional `instance` field; SolanaDeploymentResult
  threads it back from `deploy()`.
- pullChainConfig / nttFromManager take an optional `solanaInstance`
  arg, plumbed through every caller (add-chain, upgrade, clone,
  transfer-ownership, set-mint-authority, solana subcommands,
  config-mgmt's pull loop).
- deploySvm gains a multi-tenant branch: generates (or loads via
  --instance-key) an Instance keypair, sets contracts.ntt.instance
  before SDK construction, co-signs `initialize`, and returns the
  instance pubkey alongside the program address. addSolanaInstance
  no longer rethrows on initializeOrUpdateLUT failure (matches
  deploySvm's swallow-on-LUT shape; lets dev environments without
  the wormhole core bridge still write deployment.json).
- `ntt solana token-authority --instance <pubkey>` derives the
  per-instance PDA before mint-authority handoff.

SolanaNtt's constructor now refuses the v4-without-instance and
v3-with-instance footguns at construction time (the PDA factory
accepts an optional config arg for back-compat, so without this the
SDK silently falls back to legacy singleton derivations against a
multi-tenant manager). Old "isV4" branches renamed to "multiTenant"
in cli/src/solana/deploy.ts to capture the property we're checking.

Local cli/test/solana.sh:
- export COPYFILE_DISABLE=1 (macOS' AppleDouble metadata files break
  solana-test-validator's genesis-archive unpacker).
- airdrop with --commitment finalized — `solana program deploy`
  uses --commitment finalized and was racing finalization on a
  fresh airdrop, surfacing as a bogus "insufficient funds" against
  a 50-SOL account.
- new v4 multi-tenant section: asserts `ntt upgrade` is blocked at
  the v3->v4 boundary by canUpgrade(), then patches Anchor.toml +
  lib.rs to declare a locally-keypair'd id, rebuilds, deploys, and
  walks `add-chain --instance-of` end-to-end. cleanup() trap
  unconditionally restores the patched source files on exit.
cli/e2e/e2e-solana.test.ts spins up its own `solana-test-validator`
loaded with the wormhole core bridge + post-message shim + verify-vaa
shim as genesis programs (mirroring solana/Anchor.toml's [[test.genesis]]
setup) plus the local v4 NTT .so at its declared id, then drives `ntt`
end-to-end via Bun.spawn. Three tests:

- `ntt init Mainnet` writes deployment.json with the expected shape.
- `add-chain --instance-of` creates a multi-tenant Instance under the
  pre-loaded program (skipping deploy) and persists the `instance`
  pubkey alongside `manager` in deployment.json.
- `ntt upgrade Solana --ver 4.0.0` from a synthetic v3 deployment.json
  is blocked by canUpgrade() with the migration-steer error message.

Logging knobs:
  NTT_E2E_DEBUG=1    one progress line per `ntt` invocation
  NTT_E2E_VERBOSE=1  full stdout+stderr per invocation
On failure, full stdout+stderr is dumped through the thrown error.
The validator's stdout/stderr is unconditionally appended to
/tmp/ntt-e2e-validator.log for `tail -f`-style real-time inspection.

Per-test timeouts are set in-file so `bun test cli/e2e/e2e-solana.test.ts`
runs without a `--timeout` flag: validator boot ~10s, full add-chain
~70s, upgrade-block <1s.

cli/src/index.ts: `.parseAsync().then(() => process.exit(0))` instead
of `.parse()`. Without the explicit exit, the Solana SDK's `Connection`
leaves a websocket subscription open after a successful command and
the CLI hangs indefinitely waiting for an event-loop drain that won't
come. This bites real users too — `ntt add-chain`/`upgrade`/`push`
exiting cleanly is what everyone expects.

.github/workflows/cli.yml: adds `test-cli-solana-e2e` job mirroring
solana.yml's `anchor-test` setup (bun 1.3.4, solana 1.18.26, anchor
0.29.0) plus `make sdk` to produce the v4 .so, then runs the bun
suite. Uploads /tmp/ntt-e2e-validator.log as an artifact on failure
so CI-only flakes are debuggable.
The CLI imports several @wormhole-foundation/sdk-*-ntt workspace
packages that resolve via bun's workspace symlinks under root
node_modules. Without 'bun ci' at the repo root, those symlinks
don't exist and 'ntt' fails on first import:

  Cannot find module '@wormhole-foundation/sdk-evm-ntt'
    from cli/src/index.ts

Mirrors the pattern in test-cli-unit, which already does 'bun ci'
first.
`bun run --cwd cli test:e2e` was a glob over `e2e/*.test.ts`, which
meant the EVM job in cli.yml started running the new Solana e2e
suite too. The EVM CI runner has no `solana-test-validator`, so
the suite errored out in beforeAll and the job failed.

Split into:
  test:e2e:evm     — anvil-only (used by the EVM CI job)
  test:e2e:solana  — solana-test-validator-only (used locally;
                     test-cli-solana-e2e runs the file directly)
  test:e2e         — both, sequentially (for local 'run everything')
`make sdk` calls `bun run build:solana` which only builds
sdk-definitions-ntt + sdk-solana-ntt. The CLI also imports
@wormhole-foundation/sdk-evm-ntt and sdk-sui-ntt — without their
dist/ populated, bun resolves the workspace symlink to a package
whose main/module fields point at non-existent files and the
resolver surfaces it as 'Cannot find module sdk-evm-ntt'.

Replace `make sdk` with `make anchor-build` (produces the .so
and patches the IDL — what we actually need from the solana side)
plus `bun run build` at the repo root, which builds every
workspace package's TypeScript. Mirrors what cli/install.sh does
for the EVM e2e job.
@kcsongor kcsongor force-pushed the cs/solana-multi-host branch from 0ad1b81 to 44239cf Compare June 3, 2026 12:55

@johnsaigle johnsaigle left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two potential issues that I'd appreciate clarification on. 🙏🏻

I have a few design questions as well -- maybe they would be best recorded as GitHub issues or else in a design document somewhere.

  • Have you considered whether these changes a) are compatible with Fogo or b) how this change impacts SVM <-> SVM interactions? I haven't considered this in my review because I wanted to check whether it's worth spending time on first.


#[account(
seeds = [crate::TOKEN_AUTHORITY_SEED],
seeds = [crate::TOKEN_AUTHORITY_SEED, config.key().as_ref()],

@johnsaigle johnsaigle Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the inbox_item should also be validated against the instance (config) key. Otherwise, it seems possible to pass an InboxItem on Instance A and pass it through the release flow on Instance B. The recipient encoded in the message would not be altered, but the funds would be minted/unlocked out of the wrong instance. (This relies on both instances using the same token mint also -- maybe that disqualifies this scenario? If so, we should document explicitly)

It looks like OutboxItem was updated so I think we just need something similar for the InboxItem.

Here's a test I was playing with -- if useful, we could modify the logic slightly to ensure that the inbox item can't be used on the wrong instance.

(test under the multi_instance.test.ts file)

  it("demonstrates cross-instance release_inbound_mint uses an inbox item from another instance", async () => {
    const vaaForB = makeInboundVaa(
      "xinst-release-bug",
      instanceB.publicKey,
      payer.publicKey
    );

    // Prepare a valid, approved InboxItem for instance B, but do not release it.
    await redeemWithoutRelease(nttB, vaaForB, sender, signer);

    const recipientA = spl.getAssociatedTokenAddressSync(
      testMintA.address,
      payer.publicKey,
      true,
      TOKEN_PROGRAM
    );
    await spl.getOrCreateAssociatedTokenAccount(
      $.connection,
      payer,
      testMintA.address,
      payer.publicKey,
      true,
      undefined,
      undefined,
      TOKEN_PROGRAM,
      spl.ASSOCIATED_TOKEN_PROGRAM_ID
    );
    await assert.tokenBalance($.connection, recipientA).equal(0);

    // This should be rejected: config/mint/token_authority/custody are from A,
    // while inbox_item is the approved PDA for B. The current program accepts it.
    const maliciousReleaseIx = await nttA.program.methods
      .releaseInboundMint({ revertOnDelay: false, revertWhenNotReady: false })
      .accounts({
        common: {
          payer: payer.publicKey,
          config: { config: nttA.pdas.configAccount() },
          inboxItem: nttB.pdas.inboxItemAccount(
            "Ethereum",
            vaaForB.payload.nttManagerPayload
          ),
          recipient: recipientA,
          mint: testMintA.address,
          tokenAuthority: nttA.pdas.tokenAuthority(),
          tokenProgram: TOKEN_PROGRAM,
          custody: await NTT.custodyAccountAddress(
            nttA.pdas,
            await nttA.getConfig()
          ),
        },
        multisigTokenAuthority: multisigTokenAuthorityA,
      })
      .instruction();

    await $.sendAndConfirm(maliciousReleaseIx, payer);

    // 25_000 at 8 decimals released to a 9-decimal mint becomes 250_000 units.
    await assert.tokenBalance($.connection, recipientA).equal(250_000);
  }); 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is valid, added a check for this!

space = 8 + ValidatedTransceiverMessage::<TransceiverMessageData<NativeTokenTransfer<Payload>>>::INIT_SPACE,
seeds = [
ValidatedTransceiverMessage::<TransceiverMessageData<NativeTokenTransfer<Payload>>>::SEED_PREFIX,
config.key().as_ref(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we ought to add a check here such that the recipient_ntt_manager matches the instance key. Otherwise I think you can take a VAA intended for an Instance A and create the transceiver message for Instance B. That incorrectly creates the transceiver message

I believe the impact here is low-to-nil for a few reasons:

  • The exist seeds include the emitter chain and ID as seeds
  • Solana only allows one peer per Wormhole emitter chain
  • The IDs increase monotically on the sender side

This should suffice to make the account resulting from the seeds truly unique. However if all of the above weren't true, it would be possible to e.g. relay messages sent from EVM peers intended for given SVM instances to the wrong SVM instance, and potentially front-run transceiver message accounts and DoS them via the init constraint.

Let me know if you agree with this read. IMO this is definitely worth a comment because understanding the security here relies on keeping a lot of NTT subtlety in your head.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be applied to both files:

solana/programs/example-native-token-transfers/src/transceivers/wormhole/instructions/receive_message.rs
solana/programs/ntt-transceiver/src/wormhole/instructions/receive_message.rs

)]
program_data: Account<'info, ProgramData>,
/// The owner of the new instance. Distinct from the program's upgrade
/// authority — see the v4 trust-model note in the README.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no README changes in this PR. Maybe the design document could be included in this PR? Easy to do in a follow-up PR though.


#[account(
seeds = [crate::TOKEN_AUTHORITY_SEED],
seeds = [crate::TOKEN_AUTHORITY_SEED, config.key().as_ref()],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude Verify whether the TS files in the SDK are correctly deriving the token and mint authorities near e.g. cli/src/commands/set-mint-authority.ts:152.

This change makes it so the instance (config) key is part of the seeds. The relevant code is not obviously shown in the diff so when proving/disproving this claim, quote the relevant code directly and include line numbers.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol Claude emoji reacted but didn't complete I guess?

{ chain, address: toUniversal(chain, chainConfig.manager) },
overrides
overrides,
chainConfig.instance

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down in this file, the multisig is initialized with the token_authority based on the manager key. Should this include the instance key?

constraint = ValidatedTransceiverMessage::<NativeTokenTransfer<Payload>>::message(&transceiver_message.try_borrow_data()?[..])?.ntt_manager_payload().payload.to_chain == config.chain_id @ NTTError::InvalidChainId,
// check that we're the intended recipient
constraint = ValidatedTransceiverMessage::<NativeTokenTransfer<Payload>>::message(&transceiver_message.try_borrow_data()?[..])?.recipient_ntt_manager() == crate::ID.to_bytes() @ NTTError::InvalidRecipientNttManager,
// v4: the on-the-wire manager identity is the instance's `config` pubkey.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need an additional seed check here.

Previously, when a transceiver could have only a single instance associated with it, this was safe. Now, I think a maliciously crafted transceiver_message from another deployment on the same transceiver could be used. The id, recipient manager, source manager, and destination chain are all controllable via a crafted VAA, since this data is just in the body.

mut,
constraint = !outbox_item.released.get(transceiver.id)? @ NTTError::MessageAlreadySent,
)]
pub outbox_item: Account<'info, OutboxItem>,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkOutboxItemAsReleased already checks that outbox_item.manager == config.key(). Could we add this here just for a sanity check? It's in the standalone version already.

#[account(
constraint = transceiver.transceiver_address == crate::ID,
constraint = config.enabled_transceivers.get(transceiver.id)? @ NTTError::DisabledTransceiver
)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a seed derivation for this? It's already checked in the MarkOutboxItemAsReleased on the CPI. But, it'd be good to add a second layer as a sanity check; it's already done in the standalone version of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants