[KOL-91] Implement Ethereum reorg mitigation#512
Conversation
…t confirmation_depth
… fast-track path in poller
Deploying kolme with
|
| Latest commit: |
7a04403
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://119341b8.kolme.pages.dev |
| Branch Preview URL: | https://ethereum-bridge-4-reorgs.kolme.pages.dev |
feaa983 to
bc6bc97
Compare
| /// This setting is generic for future chain listeners, but currently enforced | ||
| /// only by the Ethereum listener. | ||
| #[serde(default = "ConfirmationDepth::use_default")] | ||
| pub confirmation_depth: ConfirmationDepth, |
There was a problem hiding this comment.
I originally hadn't considered this as part of the chain config itself, but rather a config set in the listeners. I can see the advantage of having it as part of the chain config, but it's also somewhat unenforceable. Putting it here does force a migration of data, but it's not a big deal. And even though the existing chains don't use a confirmation depth, they probably should, so I'm OK with this being present for all chain types.
Any thoughts on chain config vs listener-specific config?
There was a problem hiding this comment.
Chain config seems to be the single source of truth for all nodes - so storing confirmation depth there seems more robust.
Could you please explain what did you mean by unenforceable?
There was a problem hiding this comment.
Unenforceable in the sense that it's the listeners themselves who will be deciding whether sufficient confirmations are present, and they can choose to ignore this setting if they so choose. With other verifications, external entities can confirm whether a validator cheated (e.g., issuing a withdrawal to a chain without a valid Kolme-chain action, or a processor producing invalid blocks). For this case, there would still be a valid transaction on the external chain, it's just a question of whether the listener added it earlier than they should have.
There was a problem hiding this comment.
they can choose to ignore this setting
But this depends on listener's code, and not on where setting is stored, if I understand you correctly?
There was a problem hiding this comment.
Correct. That's my point: "enforcing" this at the chain level doesn't add any extra protection.
| EthereumChain::Mainnet | EthereumChain::Sepolia => POLL_INTERVAL_DEFAULT, | ||
| }; | ||
| let mut next_ws_retry = tokio::time::Instant::now(); | ||
| let mut subscriber_task: Option<tokio::task::JoinHandle<()>> = None; |
There was a problem hiding this comment.
This approach risks the spawned task outliving its parent, which I don't think is intended. There are a few ways to approach this, but I'm not really sure what the goal of the overall code is, so I'm not sure what to propose.
There was a problem hiding this comment.
The goal is to launch a subscriber, which gets events and stores their ids and block ids as hints for the poller, so it can skip blocks and jump straight to a next event blocks, when it becomes eligible.
There was a problem hiding this comment.
I mean why does this need to be a separate task instead of in the current task?
There was a problem hiding this comment.
Because subscriber runs concurrently with a poller. When subscriber receives events, they're beyond the reach of the poller, but if it has issues running (e.g. RPC provider does not support WS) - we don't want it to affect poller's work at all.
There was a problem hiding this comment.
I will fix the "exit when parent is done" thing.
There was a problem hiding this comment.
The approach doesn't work in general: panics will not lead to the subtask being aborted. The part of goal here that I was unclear on is why do we need both an external loop and a separate polling task, plus why we didn't need it previously.
There was a problem hiding this comment.
why we didn't need it previously
We did not need it, because subscriber was the default source of data and polling was a fallback. So, subscriber, technically, had its own loop, just not in parallel with poller.
For the design, that has been implemented in current PR, I did not want stream.next().await to block poller. Now, that I've looked more thoroughly and found out that tokio::select!() can be used to orchestrate poller and subscriber and, potentially, make whole thing more clean. I'll implement it and update you.
There was a problem hiding this comment.
I don't see any new commits on the PR, can you link to the change?
There was a problem hiding this comment.
Sorry, forgot to push. Fixed.
…h setting This reverts commit de6b6db.
| /// only by the Ethereum listener. | ||
| #[serde(default = "ConfirmationDepth::use_default")] | ||
| pub confirmation_depth: ConfirmationDepth, | ||
| pub confirmation_depth: Option<u64>, |
There was a problem hiding this comment.
This still leaves two questions:
- Do we want to allow for a "default" value here, which may change implicitly with code base updates? I lean towards no.
- Do we want this as part of chain config as opposed to runtime listener config.
| EthereumChain::Mainnet | EthereumChain::Sepolia => POLL_INTERVAL_DEFAULT, | ||
| }; | ||
| let mut next_ws_retry = tokio::time::Instant::now(); | ||
| let mut subscriber_task: Option<tokio::task::JoinHandle<()>> = None; |
There was a problem hiding this comment.
The approach doesn't work in general: panics will not lead to the subtask being aborted. The part of goal here that I was unclear on is why do we need both an external loop and a separate polling task, plus why we didn't need it previously.
Also: - do some refactoring and cleanup - separate Startup and Runtime errors from the subscriber - the former lead to subscriber restart for a limited amount of times, the latter do not
snoyberg
left a comment
There was a problem hiding this comment.
I'm not comfortable with the approach here, and it still comes back to my original confusion about why all this code is needed. If we're legitimately just running two tasks simultaneously, why not use select! or JoinSet? The code structure looks like it's allowing for optionality in the subscriber_supervisor which is never actually needed.
|
|
||
| impl SubscriberTaskGuard { | ||
| fn set(&mut self, handle: tokio::task::JoinHandle<()>) { | ||
| self.handle = Some(handle); |
There was a problem hiding this comment.
This won't abort an existing handle.
Task: https://fpcomplete.atlassian.net/browse/KOL-91
confirmation_depth(Option<u64>) field toChainConfigNoneconfirmation_depthset to1, and inside this test ensure that event is not processed until extra block is generated