solana: multi-tenant deployment#875
Conversation
001c6a6 to
73f9855
Compare
|
Claude finished @johnsaigle's task in 3m 35s —— View job Review: PR #875 — Solana Multi-Tenant Deployment
OverviewThis is a well-architected PR that transforms the Solana NTT program from a singleton-per-program model (v3) into a multi-instance/multi-tenant architecture (v4). Each program can now host multiple independent NTT deployments, where each deployment is identified by a caller-provided Key architectural changes:
The PR is ~15k lines with changes across 89 files. Here are my findings: Solana Program — Positive FindingsCross-instance isolation is thorough. Every critical PDA (outbox rate limit, inbox rate limit, peers, registered transceivers, token authority, session authority, emitter, inbox items, LUT) correctly incorporates OutboxItem binding is correct. The new Redeem properly validates the recipient manager identity against Ownership transfer simplification is clean. Removing the BPF loader CPI (~90 lines) and replacing it with a pure data mutation in Issues & Suggestions1.
|
| Aspect | v3 | v4 |
|---|---|---|
| Instances per program | 1 | Multiple |
| Instance identity | Program ID | Config keypair |
| PDA scoping | Global | Per-instance (config.key()) |
| Ownership | Tied to BPF upgrade authority | Independent |
| Manager identity (on-wire) | Program ID bytes | Config pubkey bytes |
| Ownership transfer | BPF loader CPI (~90 LOC) | Pure data mutation (~3 LOC) |
Overall this is a well-executed architectural change with strong cross-instance isolation. The issues noted above are relatively minor — the most actionable one is the set-mint-authority instance awareness gap.
- bind OutboxItem to manager instance - decouple BPF upgrade authority from Config owner
Multi-tenant Solana NTT (>= v4) PDAs are scoped by the per-deployment Instance pubkey, so every CLI surface that reads or writes a manager needs to know it: - ChainConfig grows an optional `instance` field; SolanaDeploymentResult threads it back from `deploy()`. - pullChainConfig / nttFromManager take an optional `solanaInstance` arg, plumbed through every caller (add-chain, upgrade, clone, transfer-ownership, set-mint-authority, solana subcommands, config-mgmt's pull loop). - deploySvm gains a multi-tenant branch: generates (or loads via --instance-key) an Instance keypair, sets contracts.ntt.instance before SDK construction, co-signs `initialize`, and returns the instance pubkey alongside the program address. addSolanaInstance no longer rethrows on initializeOrUpdateLUT failure (matches deploySvm's swallow-on-LUT shape; lets dev environments without the wormhole core bridge still write deployment.json). - `ntt solana token-authority --instance <pubkey>` derives the per-instance PDA before mint-authority handoff. SolanaNtt's constructor now refuses the v4-without-instance and v3-with-instance footguns at construction time (the PDA factory accepts an optional config arg for back-compat, so without this the SDK silently falls back to legacy singleton derivations against a multi-tenant manager). Old "isV4" branches renamed to "multiTenant" in cli/src/solana/deploy.ts to capture the property we're checking. Local cli/test/solana.sh: - export COPYFILE_DISABLE=1 (macOS' AppleDouble metadata files break solana-test-validator's genesis-archive unpacker). - airdrop with --commitment finalized — `solana program deploy` uses --commitment finalized and was racing finalization on a fresh airdrop, surfacing as a bogus "insufficient funds" against a 50-SOL account. - new v4 multi-tenant section: asserts `ntt upgrade` is blocked at the v3->v4 boundary by canUpgrade(), then patches Anchor.toml + lib.rs to declare a locally-keypair'd id, rebuilds, deploys, and walks `add-chain --instance-of` end-to-end. cleanup() trap unconditionally restores the patched source files on exit.
cli/e2e/e2e-solana.test.ts spins up its own `solana-test-validator` loaded with the wormhole core bridge + post-message shim + verify-vaa shim as genesis programs (mirroring solana/Anchor.toml's [[test.genesis]] setup) plus the local v4 NTT .so at its declared id, then drives `ntt` end-to-end via Bun.spawn. Three tests: - `ntt init Mainnet` writes deployment.json with the expected shape. - `add-chain --instance-of` creates a multi-tenant Instance under the pre-loaded program (skipping deploy) and persists the `instance` pubkey alongside `manager` in deployment.json. - `ntt upgrade Solana --ver 4.0.0` from a synthetic v3 deployment.json is blocked by canUpgrade() with the migration-steer error message. Logging knobs: NTT_E2E_DEBUG=1 one progress line per `ntt` invocation NTT_E2E_VERBOSE=1 full stdout+stderr per invocation On failure, full stdout+stderr is dumped through the thrown error. The validator's stdout/stderr is unconditionally appended to /tmp/ntt-e2e-validator.log for `tail -f`-style real-time inspection. Per-test timeouts are set in-file so `bun test cli/e2e/e2e-solana.test.ts` runs without a `--timeout` flag: validator boot ~10s, full add-chain ~70s, upgrade-block <1s. cli/src/index.ts: `.parseAsync().then(() => process.exit(0))` instead of `.parse()`. Without the explicit exit, the Solana SDK's `Connection` leaves a websocket subscription open after a successful command and the CLI hangs indefinitely waiting for an event-loop drain that won't come. This bites real users too — `ntt add-chain`/`upgrade`/`push` exiting cleanly is what everyone expects. .github/workflows/cli.yml: adds `test-cli-solana-e2e` job mirroring solana.yml's `anchor-test` setup (bun 1.3.4, solana 1.18.26, anchor 0.29.0) plus `make sdk` to produce the v4 .so, then runs the bun suite. Uploads /tmp/ntt-e2e-validator.log as an artifact on failure so CI-only flakes are debuggable.
The CLI imports several @wormhole-foundation/sdk-*-ntt workspace
packages that resolve via bun's workspace symlinks under root
node_modules. Without 'bun ci' at the repo root, those symlinks
don't exist and 'ntt' fails on first import:
Cannot find module '@wormhole-foundation/sdk-evm-ntt'
from cli/src/index.ts
Mirrors the pattern in test-cli-unit, which already does 'bun ci'
first.
`bun run --cwd cli test:e2e` was a glob over `e2e/*.test.ts`, which
meant the EVM job in cli.yml started running the new Solana e2e
suite too. The EVM CI runner has no `solana-test-validator`, so
the suite errored out in beforeAll and the job failed.
Split into:
test:e2e:evm — anvil-only (used by the EVM CI job)
test:e2e:solana — solana-test-validator-only (used locally;
test-cli-solana-e2e runs the file directly)
test:e2e — both, sequentially (for local 'run everything')
`make sdk` calls `bun run build:solana` which only builds sdk-definitions-ntt + sdk-solana-ntt. The CLI also imports @wormhole-foundation/sdk-evm-ntt and sdk-sui-ntt — without their dist/ populated, bun resolves the workspace symlink to a package whose main/module fields point at non-existent files and the resolver surfaces it as 'Cannot find module sdk-evm-ntt'. Replace `make sdk` with `make anchor-build` (produces the .so and patches the IDL — what we actually need from the solana side) plus `bun run build` at the repo root, which builds every workspace package's TypeScript. Mirrors what cli/install.sh does for the EVM e2e job.
0ad1b81 to
44239cf
Compare
johnsaigle
left a comment
There was a problem hiding this comment.
Two potential issues that I'd appreciate clarification on. 🙏🏻
I have a few design questions as well -- maybe they would be best recorded as GitHub issues or else in a design document somewhere.
- Have you considered whether these changes a) are compatible with Fogo or b) how this change impacts SVM <-> SVM interactions? I haven't considered this in my review because I wanted to check whether it's worth spending time on first.
|
|
||
| #[account( | ||
| seeds = [crate::TOKEN_AUTHORITY_SEED], | ||
| seeds = [crate::TOKEN_AUTHORITY_SEED, config.key().as_ref()], |
There was a problem hiding this comment.
My understanding is that the inbox_item should also be validated against the instance (config) key. Otherwise, it seems possible to pass an InboxItem on Instance A and pass it through the release flow on Instance B. The recipient encoded in the message would not be altered, but the funds would be minted/unlocked out of the wrong instance. (This relies on both instances using the same token mint also -- maybe that disqualifies this scenario? If so, we should document explicitly)
It looks like OutboxItem was updated so I think we just need something similar for the InboxItem.
Here's a test I was playing with -- if useful, we could modify the logic slightly to ensure that the inbox item can't be used on the wrong instance.
(test under the multi_instance.test.ts file)
it("demonstrates cross-instance release_inbound_mint uses an inbox item from another instance", async () => {
const vaaForB = makeInboundVaa(
"xinst-release-bug",
instanceB.publicKey,
payer.publicKey
);
// Prepare a valid, approved InboxItem for instance B, but do not release it.
await redeemWithoutRelease(nttB, vaaForB, sender, signer);
const recipientA = spl.getAssociatedTokenAddressSync(
testMintA.address,
payer.publicKey,
true,
TOKEN_PROGRAM
);
await spl.getOrCreateAssociatedTokenAccount(
$.connection,
payer,
testMintA.address,
payer.publicKey,
true,
undefined,
undefined,
TOKEN_PROGRAM,
spl.ASSOCIATED_TOKEN_PROGRAM_ID
);
await assert.tokenBalance($.connection, recipientA).equal(0);
// This should be rejected: config/mint/token_authority/custody are from A,
// while inbox_item is the approved PDA for B. The current program accepts it.
const maliciousReleaseIx = await nttA.program.methods
.releaseInboundMint({ revertOnDelay: false, revertWhenNotReady: false })
.accounts({
common: {
payer: payer.publicKey,
config: { config: nttA.pdas.configAccount() },
inboxItem: nttB.pdas.inboxItemAccount(
"Ethereum",
vaaForB.payload.nttManagerPayload
),
recipient: recipientA,
mint: testMintA.address,
tokenAuthority: nttA.pdas.tokenAuthority(),
tokenProgram: TOKEN_PROGRAM,
custody: await NTT.custodyAccountAddress(
nttA.pdas,
await nttA.getConfig()
),
},
multisigTokenAuthority: multisigTokenAuthorityA,
})
.instruction();
await $.sendAndConfirm(maliciousReleaseIx, payer);
// 25_000 at 8 decimals released to a 9-decimal mint becomes 250_000 units.
await assert.tokenBalance($.connection, recipientA).equal(250_000);
}); There was a problem hiding this comment.
this is valid, added a check for this!
| space = 8 + ValidatedTransceiverMessage::<TransceiverMessageData<NativeTokenTransfer<Payload>>>::INIT_SPACE, | ||
| seeds = [ | ||
| ValidatedTransceiverMessage::<TransceiverMessageData<NativeTokenTransfer<Payload>>>::SEED_PREFIX, | ||
| config.key().as_ref(), |
There was a problem hiding this comment.
I think we ought to add a check here such that the recipient_ntt_manager matches the instance key. Otherwise I think you can take a VAA intended for an Instance A and create the transceiver message for Instance B. That incorrectly creates the transceiver message
I believe the impact here is low-to-nil for a few reasons:
- The exist seeds include the emitter chain and ID as seeds
- Solana only allows one peer per Wormhole emitter chain
- The IDs increase monotically on the sender side
This should suffice to make the account resulting from the seeds truly unique. However if all of the above weren't true, it would be possible to e.g. relay messages sent from EVM peers intended for given SVM instances to the wrong SVM instance, and potentially front-run transceiver message accounts and DoS them via the init constraint.
Let me know if you agree with this read. IMO this is definitely worth a comment because understanding the security here relies on keeping a lot of NTT subtlety in your head.
There was a problem hiding this comment.
This should be applied to both files:
solana/programs/example-native-token-transfers/src/transceivers/wormhole/instructions/receive_message.rs
solana/programs/ntt-transceiver/src/wormhole/instructions/receive_message.rs
| )] | ||
| program_data: Account<'info, ProgramData>, | ||
| /// The owner of the new instance. Distinct from the program's upgrade | ||
| /// authority — see the v4 trust-model note in the README. |
There was a problem hiding this comment.
There are no README changes in this PR. Maybe the design document could be included in this PR? Easy to do in a follow-up PR though.
|
|
||
| #[account( | ||
| seeds = [crate::TOKEN_AUTHORITY_SEED], | ||
| seeds = [crate::TOKEN_AUTHORITY_SEED, config.key().as_ref()], |
There was a problem hiding this comment.
@claude Verify whether the TS files in the SDK are correctly deriving the token and mint authorities near e.g. cli/src/commands/set-mint-authority.ts:152.
This change makes it so the instance (config) key is part of the seeds. The relevant code is not obviously shown in the diff so when proving/disproving this claim, quote the relevant code directly and include line numbers.
There was a problem hiding this comment.
Lol Claude emoji reacted but didn't complete I guess?
| { chain, address: toUniversal(chain, chainConfig.manager) }, | ||
| overrides | ||
| overrides, | ||
| chainConfig.instance |
There was a problem hiding this comment.
Further down in this file, the multisig is initialized with the token_authority based on the manager key. Should this include the instance key?
| constraint = ValidatedTransceiverMessage::<NativeTokenTransfer<Payload>>::message(&transceiver_message.try_borrow_data()?[..])?.ntt_manager_payload().payload.to_chain == config.chain_id @ NTTError::InvalidChainId, | ||
| // check that we're the intended recipient | ||
| constraint = ValidatedTransceiverMessage::<NativeTokenTransfer<Payload>>::message(&transceiver_message.try_borrow_data()?[..])?.recipient_ntt_manager() == crate::ID.to_bytes() @ NTTError::InvalidRecipientNttManager, | ||
| // v4: the on-the-wire manager identity is the instance's `config` pubkey. |
There was a problem hiding this comment.
I think we need an additional seed check here.
Previously, when a transceiver could have only a single instance associated with it, this was safe. Now, I think a maliciously crafted transceiver_message from another deployment on the same transceiver could be used. The id, recipient manager, source manager, and destination chain are all controllable via a crafted VAA, since this data is just in the body.
| mut, | ||
| constraint = !outbox_item.released.get(transceiver.id)? @ NTTError::MessageAlreadySent, | ||
| )] | ||
| pub outbox_item: Account<'info, OutboxItem>, |
There was a problem hiding this comment.
MarkOutboxItemAsReleased already checks that outbox_item.manager == config.key(). Could we add this here just for a sanity check? It's in the standalone version already.
| #[account( | ||
| constraint = transceiver.transceiver_address == crate::ID, | ||
| constraint = config.enabled_transceivers.get(transceiver.id)? @ NTTError::DisabledTransceiver | ||
| )] |
There was a problem hiding this comment.
Could we add a seed derivation for this? It's already checked in the MarkOutboxItemAsReleased on the CPI. But, it'd be good to add a second layer as a sanity check; it's already done in the standalone version of this.
No description provided.