Skip to content

sync+net: template for GlobalMutex, move fields to Peer#2

Open
Crypt-iQ wants to merge 2 commits into
masterfrom
06102026/split_cnodestate
Open

sync+net: template for GlobalMutex, move fields to Peer#2
Crypt-iQ wants to merge 2 commits into
masterfrom
06102026/split_cnodestate

Conversation

@Crypt-iQ

Copy link
Copy Markdown
Owner

No description provided.

@Crypt-iQ Crypt-iQ force-pushed the 06102026/split_cnodestate branch from ed142d5 to 41c4d6a Compare June 16, 2026 18:13
I am not sure if this was intentional, but the explicit instantiation
for GlobalMutex was missing.
@Crypt-iQ Crypt-iQ force-pushed the 06102026/split_cnodestate branch 2 times, most recently from db737e4 to 722aa70 Compare June 16, 2026 18:16
Move vBlocksInFlight, m_stalling_since, m_downloading_since to Peer
and guard these fields (and mapBlocksInFlight) with a GlobalMutex
cs_blocks_in_flight. This allows BlockRequested to take Peer& instead
of NodeId.

Notes:

- GlobalMutex is chosen because both mapBlocksInFlight and
  vBlocksInFlight must be locked under the same mutex since
  mapBlocksInFlight contains an iterator to vBlocksInFlight. A
  RecursiveMutex was not chosen because the lock is never acquired
  twice. The global mutex is a weird design and putting the mutex
  in PeerManagerImpl doesn't seem possible without moving Peer inside of
  PeerManagerImpl due to the compiler failing due to incomplete type
  errors.

- mapBlocksInFlight has an implicit assumption that any NodeId found
  in its map is still able to be queried in m_node_states & m_peer_map.
  This change preserves this behavior in FinalizeNode.

- SendMessages acquires the mutex for a short time, and because of this,
  there is technically a minor behavior change where the fields could have
  been modified by another thread before locking cs_blocks_in_flight, while
  cs_main is held. This doesn't race or anything. The mutex cannot be locked
  where cs_main is because there would be a "potential deadlock" (previous
  patch did this):

    - thread 1: ActivateBestChain
                  LOCK(cs_main)
                  LOCK(MempoolMutex())                    (1)
                    BlockChecked
                    LOCK2(cs_main, cs_blocks_in_flight)   (2)

    - thread 2: SendMessages
                  LOCK2(cs_main, cs_blocks_in_flight)     (2)
                  LOCK(MempoolMutex())                    (1)

  A deadlock cannot actually occur here because cs_main is the outer lock so
  the two threads would be waiting on cs_main instead. However, if cs_main were
  ever removed, this would pop up.

- The above point about deadlock made me realize something kind of fundamentally
  wrong about my approach. Making a more granular mutex to lock mapBlocksInFlight
  or vBlocksInFlight seems impossible because of the cmpctblock logic also acquiring
  the mempool logic (the fuzz tests caught this one):

    - thread 1: ActivateBestChain
                  LOCK(cs_main)
                  LOCK(MempoolMutex())                    (1)
                    BlockChecked
                    LOCK2(cs_main, cs_blocks_in_flight)   (2)

    - thread 2: ProcessMessages (NetMsgType::CMPCTBLOCK)
                  LOCK(cs_main)
                  LOCK(cs_blocks_in_flight)               (2) *
                    partialBlock.InitData
                    LOCK(pool->cs)                        (1)

    * Calling into BlockRequested and setting / using pit means that cs_blocks_in_flight
      MUST be held while calling into InitData otherwise we might have a dangling
      reference.

    The above deadlock makes me think that having these two fields locked by anything
    other than cs_main will inevitably lead to deadlock. Maybe I'm missing something?
    I think it would be an anti-pattern to move these fields and also lock them with
    cs_main?

- TSA were added, some may be superfluous, if this approach is worth
  going forward with, I'll clean this up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant