Skip to content

export functionality: start peers and wait for peer network to stabilize before exporting#162

Open
inqrphl wants to merge 3 commits into
sni:masterfrom
inqrphl:export-add-support-for-tiered-lmd
Open

export functionality: start peers and wait for peer network to stabilize before exporting#162
inqrphl wants to merge 3 commits into
sni:masterfrom
inqrphl:export-add-support-for-tiered-lmd

Conversation

@inqrphl

@inqrphl inqrphl commented Jun 10, 2026

Copy link
Copy Markdown

exporting used to only contact peers in the config and only call initAllTables. it did not start the peer at all, and only exported the initial state of the tables. this led to sub-peers in the lower tiers not being discovered.

starting the peer would call peer.tryUpdate -> peer.updateTick -> datastoreset.updateDelta -> ds.sync.UpdateDelta

this would then get sites table for LMD peers , and /thruk/r/v1/sites API endpoint for thruk peers. if a new site was discovered, it would then call peer.addSubPeer

the sub peer would be added to lmd.peerMap , and started independently to possibly discover all sub peers.

in the exporter function, add a network state stabilization function. this starts all peers in configuration, and lets them discover their network independently.

after that it waits for stabilization in rounds. each round checks the peer map, seeing if the total count of peers went up and every peer has data. if the peer map looks stable for three consecutive rounds, it exits. maximum of 30 rounds with 10 second sleep time in between.

after the stabilization rounds, the peer tables are exported. exported peer count is passed upwards in the call chain, up until to the main.go. this is done to stay true to the exported peer count. after the maximum rounds more peers could be discovered and added to peerMap, so saying that len(peerMap) of peers are exported could be wrong.

…ize before exporting

exporting used to only contact peers in the config and only call initAllTables. it did not start the peer at all, and onlt exported the initial state of the tables. this led to sub-peers in the lower tiers not being discovered.

starting the peer would call peer.tryUpdate -> peer.updateTick ->  datastoreset.updateDelta -> ds.sync.UpdateDelta

this would then get sites table for LMD peers , and /thruk/r/v1/sites API endpoint for thruk peers. if a new site was discovered, it would then call peer.addSubPeer

the sub peer would be added to lmd.peerMap , and started independently to possibly discover all sub peers.

in the exporter function, add a network state stabilization function. this starts all peers in configuration, and lets them discover their network independently.

after that it waits for stabilization in rounds. each round checks the peer map, seeing if the total count of peers went up and every peer has data. if the peer map looks stable for three consecutive rounds, it exits. maximum of 30 rounds with 10 second sleep time in between.

after the stabilization rounds, the peer tables are exported. exported peer count is passed upwards in the call chain, up until to the main.go. this is done to stay true to the exported peer count. after the maximum rounds more peers could be discovered and added to peerMap, so saying that len(peerMap) of peers are exported could be wrong.
@inqrphl

inqrphl commented Jun 10, 2026

Copy link
Copy Markdown
Author

Tested with the lmd_federation_multitier_e2e scenario, which spwans three tiers of OMD instances.

ahmet@jelinek:~/repositories/thruk/t/scenarios/lmd_federation_multitier_e2e$ docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED        STATUS        PORTS                                                                                                NAMES
84bccd3b8bfd   lmd_federation_multitier_e2e-tier3b   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier3b-1
5e34d7bac3dc   lmd_federation_multitier_e2e-tier1a   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 4730/tcp, 5666/tcp, 127.0.0.3:60080->80/tcp, 127.0.0.3:60443->443/tcp                        lmd_federation_multitier_e2e-tier1a-1
e7e42c9bbafc   lmd_federation_multitier_e2e-tier3a   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier3a-1
a753e362f6f3   lmd_federation_multitier_e2e-tier1d   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 4730/tcp, 5666/tcp, 127.0.0.3:60083->80/tcp, 127.0.0.3:60446->443/tcp                        lmd_federation_multitier_e2e-tier1d-1
733d6722ca07   lmd_federation_multitier_e2e-tier2c   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier2c-1
f19f79fe12e1   lmd_federation_multitier_e2e-tier2a   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier2a-1
4e24c683ecbb   lmd_federation_multitier_e2e-tier3c   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier3c-1
475414119bff   lmd_federation_multitier_e2e-tier1b   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 4730/tcp, 5666/tcp, 127.0.0.3:60081->80/tcp, 127.0.0.3:60444->443/tcp                        lmd_federation_multitier_e2e-tier1b-1
c50110e0c366   lmd_federation_multitier_e2e-tier2e   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier2e-1
de9eaaa392de   lmd_federation_multitier_e2e-tier1c   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 4730/tcp, 5666/tcp, 127.0.0.3:60082->80/tcp, 127.0.0.3:60445->443/tcp                        lmd_federation_multitier_e2e-tier1c-1
d1ee2e277e24   lmd_federation_multitier_e2e-tier2b   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier2b-1
30acc5b640b2   lmd_federation_multitier_e2e-tier2d   "/root/start.sh"         19 hours ago   Up 19 hours   22/tcp, 80/tcp, 443/tcp, 4730/tcp, 5666/tcp                                                          lmd_federation_multitier_e2e-tier2d-1
731c643400a4   naemon-dev-box-devbox                 "/bin/sh -c /box/dev…"   2 weeks ago    Up 25 hours   0.0.0.0:1980->80/tcp, [::]:1980->80/tcp, 0.0.0.0:19443->443/tcp, [::]:19443->443/tcp                 naemon-dev-box-devbox-1
2fb85378e4fe   portainer/portainer-ce:lts            "/portainer"             3 months ago   Up 25 hours   0.0.0.0:8000->8000/tcp, [::]:8000->8000/tcp, 0.0.0.0:9443->9443/tcp, [::]:9443->9443/tcp, 9000/tcp   portainer
ahmet@jelinek:~/repositories/lmd$ ./lmd.linux.amd64 --logfile stdout -config lmd_export_test.ini --export export.tgz
[2026-06-10 12:15:22.016][Info][pid:669002][exporter:28] starting export to export.tgz
[2026-06-10 12:15:22.016][Info][pid:669002][peer:305] [tier1a] starting connection
[2026-06-10 12:15:22.016][Info][pid:669002][peer:305] [tier1b] starting connection
[2026-06-10 12:15:22.016][Info][pid:669002][peer:305] [tier1c] starting connection
[2026-06-10 12:15:22.016][Info][pid:669002][peer:305] [tier1d] starting connection
[2026-06-10 12:15:22.016][Info][pid:669002][exporter:201] waiting for all peers to connect and initialize
[2026-06-10 12:15:22.060][Info][pid:669002][peer:1519] [tier1d] remote connection MultiBackend flag set, got 2 sites
[2026-06-10 12:15:22.062][Info][pid:669002][peer:1519] [tier1a] remote connection MultiBackend flag set, got 8 sites
[2026-06-10 12:15:22.122][Info][pid:669002][peer:726] [tier1d] initial objects synchronized in: 62.120492ms
[2026-06-10 12:15:22.125][Info][pid:669002][peer:726] [tier1a] initial objects synchronized in: 63.168691ms
[2026-06-10 12:15:22.142][Info][pid:669002][peer:726] [tier1d] initial objects synchronized in: 125.336869ms
[2026-06-10 12:15:22.148][Info][pid:669002][peer:726] [tier1a] initial objects synchronized in: 131.785687ms
[2026-06-10 12:15:22.168][Info][pid:669002][peer:726] [tier1b] initial objects synchronized in: 151.876835ms
[2026-06-10 12:15:22.170][Info][pid:669002][peer:726] [tier1c] initial objects synchronized in: 153.361301ms
[2026-06-10 12:15:29.370][Info][pid:669002][peer:305] [tier1d] starting connection
[2026-06-10 12:15:29.371][Info][pid:669002][peer:305] [tier2d] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier1a] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier2b] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier2c] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier2e] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier3c] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier2a] starting connection
[2026-06-10 12:15:29.384][Info][pid:669002][peer:305] [tier3a] starting connection
[2026-06-10 12:15:29.385][Info][pid:669002][peer:305] [tier3b] starting connection
[2026-06-10 12:15:29.576][Info][pid:669002][peer:726] [tier1d] initial objects synchronized in: 204.976402ms
[2026-06-10 12:15:29.607][Info][pid:669002][peer:726] [tier2d] initial objects synchronized in: 236.232081ms
[2026-06-10 12:15:29.869][Info][pid:669002][peer:726] [tier2e] initial objects synchronized in: 484.619575ms
[2026-06-10 12:15:29.880][Info][pid:669002][peer:726] [tier1a] initial objects synchronized in: 495.708886ms
[2026-06-10 12:15:29.901][Info][pid:669002][peer:726] [tier3b] initial objects synchronized in: 516.649309ms
[2026-06-10 12:15:30.171][Info][pid:669002][peer:726] [tier2c] initial objects synchronized in: 785.288771ms
[2026-06-10 12:15:30.195][Info][pid:669002][peer:726] [tier3a] initial objects synchronized in: 810.205028ms
[2026-06-10 12:15:30.572][Info][pid:669002][peer:726] [tier3c] initial objects synchronized in: 1.187727796s
[2026-06-10 12:15:30.583][Info][pid:669002][peer:726] [tier2b] initial objects synchronized in: 1.198999243s
[2026-06-10 12:15:30.845][Info][pid:669002][peer:726] [tier2a] initial objects synchronized in: 1.460156078s
[2026-06-10 12:16:02.019][Info][pid:669002][exporter:203] all peers ready for export (14 peers)
[2026-06-10 12:16:02.022][Info][pid:669002][exporter:122] exported     tier1b (tier1b), used space:       15 kb
[2026-06-10 12:16:02.023][Info][pid:669002][exporter:122] exported     tier1c (tier1c), used space:       15 kb
[2026-06-10 12:16:02.025][Info][pid:669002][exporter:122] exported     tier1d (8d6dc), used space:       16 kb
[2026-06-10 12:16:02.027][Info][pid:669002][exporter:122] exported     tier2d (881f5), used space:       15 kb
[2026-06-10 12:16:02.029][Info][pid:669002][exporter:122] exported     tier1a (2ae43), used space:       17 kb
[2026-06-10 12:16:02.032][Info][pid:669002][exporter:122] exported     tier2b (c369c), used space:       15 kb
[2026-06-10 12:16:02.034][Info][pid:669002][exporter:122] exported     tier2c (c784e), used space:       15 kb
[2026-06-10 12:16:02.037][Info][pid:669002][exporter:122] exported     tier2e (84bd2), used space:       16 kb
[2026-06-10 12:16:02.039][Info][pid:669002][exporter:122] exported     tier3c (a8dd1), used space:       35 kb
[2026-06-10 12:16:02.045][Info][pid:669002][exporter:122] exported     tier2a (c21da), used space:       18 kb
[2026-06-10 12:16:02.047][Info][pid:669002][exporter:122] exported     tier3a (5f0ee), used space:       17 kb
[2026-06-10 12:16:02.048][Info][pid:669002][exporter:122] exported     tier3b (e984d), used space:       16 kb
[2026-06-10 12:16:02.048][Info][pid:669002][main:385] exported 12 peers successfully

…erAndDiscoverSubpeers

this function calls peer.initAllTables and peer.updateTick once and does not do anything else with the peer

use this function for initializing peers defined in config, and any other new peers while waiting for peer network to stabilize
@inqrphl

inqrphl commented Jun 10, 2026

Copy link
Copy Markdown
Author

starting a peer would have it run in the background indefinitely while the peer network was stabilizing.

instead of starting, wrote a new function instead called initializePeerAndDiscoverSubpeers. It calls peer.initAllTables and does a manual peer.updateTick, which will discover the subpeers.

call this function for peers in lmd config first, and while waiting for stabilization iterate through peer list, and then call it for any new peers as well.

the function calls waitGroup.Done() at the end, so concurrent instances for different peers are possible using goroutines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant