perf: reuse a shared aiohttp session and cache templates#3
Conversation
foxpatch-aleph
left a comment
There was a problem hiding this comment.
Clean performance refactor that replaces per-request aiohttp.ClientSession objects with a single shared session, enabling TCP/TLS connection reuse. Also caches HTML templates at import time instead of reading them on every request. The asyncio semaphore is correctly replaced with the TCPConnector's built-in connection limit. There's a minor theoretical race on session creation during cold start, but in a single-threaded async event loop this is exceedingly unlikely and non-destructive if it does occur. Overall a well-scoped, correct improvement.
Replace the per-request aiohttp.ClientSession in fetch_crn_endpoint, _fetch_node_list and fetch_gpu_aggregate with a single shared session. Each refresh fans out 3 requests per node, so opening a new session per call meant a fresh TCP and TLS handshake every time with no connection reuse. A shared session pools keep-alive connections per host. Bound total concurrency with the TCPConnector limit instead of the asyncio semaphore, and close the session on shutdown via a lifespan handler. The fd-limit value now feeds the connector limit. Read index.html and debug.html once at import instead of on every request to "/" and "/debug.html". Note: this also carries a pre-existing working-tree change to DIAG_VM_PATH_IPv6_CHECK that was already staged in the tree.
319b748 to
a7e9f37
Compare
foxpatch-aleph
left a comment
There was a problem hiding this comment.
Well-structured performance refactor that consolidates per-request aiohttp.ClientSession instances into a single shared session for connection pooling and TLS handshake reuse. The TCPConnector replaces the asyncio.Semaphore to bound concurrency at the transport level with equivalent limits. Templates are cached at module level to avoid redundant filesystem reads. The lifespan handler correctly closes the session on shutdown. No bugs or security issues found — the change is clean and correct.
An aiohttp ClientSession is bound to the event loop it was created on. The module-global session was being reused across loops in the test harness (pytest-asyncio and TestClient create a fresh loop per test), which raised "Event loop is closed" from the background refresh task. Track the loop the session was created on and recreate it when the running loop changes. Production runs a single long-lived loop, so one session is still reused for the process lifetime.
The aggregate mocks pointed at api2.aleph.im while the code uses API_HOST=official.aleph.cloud, so every mocked request missed and the tests failed. Point them at the real host. For the test node (version 1.3.0, which is <= 1.8.1) the code probes IPv6 through the diag VM endpoint, not /status/check/ipv6, so mock that endpoint instead. Assert the debug_usage_from_crn_at and usage_from_crn_error fields the response currently emits. Pin aiohttp below 3.14 and bump aioresponses to 0.7.8: aiohttp 3.14 changed ClientResponse.__init__ and no released aioresponses can build it, which made the mocked tests raise TypeError. Note: this also carries pre-existing working-tree package-mode lines in pyproject.toml.
foxpatch-aleph
left a comment
There was a problem hiding this comment.
Sound performance optimization replacing per-request aiohttp sessions with a shared, connection-pooled session. The connector's limit properly replaces the semaphore for concurrency control, and template caching at import time is a simple win. No correctness or security issues introduced. One pre-existing bug in format_response (usage fields referencing config instead of system data) is worth fixing separately.
src/nodes_list/main.py (line 479): Pre-existing bug (not introduced by this PR): "debug_usage_from_crn_at" and "usage_from_crn_error" both reference crn_info.config when they should reference crn_info.system (i.e., crn_info.system.fetched_at and str(crn_info.system.error)). The test at test_routes.py:88-89 asserts these values against the config timestamp, encoding the bug rather than catching it.
Replace the per-request aiohttp.ClientSession in fetch_crn_endpoint, _fetch_node_list and fetch_gpu_aggregate with a single shared session. Each refresh fans out 3 requests per node, so opening a new session per call meant a fresh TCP and TLS handshake every time with no connection reuse. A shared session pools keep-alive connections per host.
Bound total concurrency with the TCPConnector limit instead of the asyncio semaphore, and close the session on shutdown via a lifespan handler. The fd-limit value now feeds the connector limit.
Read index.html and debug.html once at import instead of on every request to "/" and "/debug.html".
Note: this also carries a pre-existing working-tree change to DIAG_VM_PATH_IPv6_CHECK that was already staged in the tree.