Skip to content

perf: reuse a shared aiohttp session and cache templates#3

Merged
aliel merged 3 commits into
mainfrom
perf/shared-http-session
Jun 13, 2026
Merged

perf: reuse a shared aiohttp session and cache templates#3
aliel merged 3 commits into
mainfrom
perf/shared-http-session

Conversation

@aliel

@aliel aliel commented Jun 13, 2026

Copy link
Copy Markdown
Member

Replace the per-request aiohttp.ClientSession in fetch_crn_endpoint, _fetch_node_list and fetch_gpu_aggregate with a single shared session. Each refresh fans out 3 requests per node, so opening a new session per call meant a fresh TCP and TLS handshake every time with no connection reuse. A shared session pools keep-alive connections per host.

Bound total concurrency with the TCPConnector limit instead of the asyncio semaphore, and close the session on shutdown via a lifespan handler. The fd-limit value now feeds the connector limit.

Read index.html and debug.html once at import instead of on every request to "/" and "/debug.html".

Note: this also carries a pre-existing working-tree change to DIAG_VM_PATH_IPv6_CHECK that was already staged in the tree.

@aliel aliel requested a review from foxpatch-aleph June 13, 2026 12:51

@foxpatch-aleph foxpatch-aleph left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean performance refactor that replaces per-request aiohttp.ClientSession objects with a single shared session, enabling TCP/TLS connection reuse. Also caches HTML templates at import time instead of reading them on every request. The asyncio semaphore is correctly replaced with the TCPConnector's built-in connection limit. There's a minor theoretical race on session creation during cold start, but in a single-threaded async event loop this is exceedingly unlikely and non-destructive if it does occur. Overall a well-scoped, correct improvement.

Replace the per-request aiohttp.ClientSession in fetch_crn_endpoint,
_fetch_node_list and fetch_gpu_aggregate with a single shared session.
Each refresh fans out 3 requests per node, so opening a new session per
call meant a fresh TCP and TLS handshake every time with no connection
reuse. A shared session pools keep-alive connections per host.

Bound total concurrency with the TCPConnector limit instead of the
asyncio semaphore, and close the session on shutdown via a lifespan
handler. The fd-limit value now feeds the connector limit.

Read index.html and debug.html once at import instead of on every
request to "/" and "/debug.html".

Note: this also carries a pre-existing working-tree change to
DIAG_VM_PATH_IPv6_CHECK that was already staged in the tree.
@aliel aliel force-pushed the perf/shared-http-session branch from 319b748 to a7e9f37 Compare June 13, 2026 19:51

@foxpatch-aleph foxpatch-aleph left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured performance refactor that consolidates per-request aiohttp.ClientSession instances into a single shared session for connection pooling and TLS handshake reuse. The TCPConnector replaces the asyncio.Semaphore to bound concurrency at the transport level with equivalent limits. Templates are cached at module level to avoid redundant filesystem reads. The lifespan handler correctly closes the session on shutdown. No bugs or security issues found — the change is clean and correct.

aliel added 2 commits June 13, 2026 22:20
An aiohttp ClientSession is bound to the event loop it was created on.
The module-global session was being reused across loops in the test
harness (pytest-asyncio and TestClient create a fresh loop per test),
which raised "Event loop is closed" from the background refresh task.

Track the loop the session was created on and recreate it when the
running loop changes. Production runs a single long-lived loop, so one
session is still reused for the process lifetime.
The aggregate mocks pointed at api2.aleph.im while the code uses
API_HOST=official.aleph.cloud, so every mocked request missed and the
tests failed. Point them at the real host.

For the test node (version 1.3.0, which is <= 1.8.1) the code probes
IPv6 through the diag VM endpoint, not /status/check/ipv6, so mock that
endpoint instead. Assert the debug_usage_from_crn_at and
usage_from_crn_error fields the response currently emits.

Pin aiohttp below 3.14 and bump aioresponses to 0.7.8: aiohttp 3.14
changed ClientResponse.__init__ and no released aioresponses can build
it, which made the mocked tests raise TypeError.

Note: this also carries pre-existing working-tree package-mode lines in
pyproject.toml.

@foxpatch-aleph foxpatch-aleph left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound performance optimization replacing per-request aiohttp sessions with a shared, connection-pooled session. The connector's limit properly replaces the semaphore for concurrency control, and template caching at import time is a simple win. No correctness or security issues introduced. One pre-existing bug in format_response (usage fields referencing config instead of system data) is worth fixing separately.

src/nodes_list/main.py (line 479): Pre-existing bug (not introduced by this PR): "debug_usage_from_crn_at" and "usage_from_crn_error" both reference crn_info.config when they should reference crn_info.system (i.e., crn_info.system.fetched_at and str(crn_info.system.error)). The test at test_routes.py:88-89 asserts these values against the config timestamp, encoding the bug rather than catching it.

@aliel aliel merged commit 30c785f into main Jun 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants