Conversation
The source-context (Etherscan verified source + ABI) and Swiss Knife label caches were process-lifetime dicts, so every cron run re-fetched the same immutable data. On GitHub Actions that was unavoidable; on the VPS the cache dir (/srv/cache) is persistent, so we can reuse it across runs. Add utils/disk_cache.py: a small file-backed JSON cache (one file per key) with per-entry TTL and size-bounded LRU eviction, mirroring the existing selector-cache pattern in calldata/decoder.py. Writes are atomic (temp + os.replace) so overlapping hourly/multisig profiles can't read a half-written entry. Resolved under CACHE_DIR lazily so tests can redirect it. Wire it under the existing in-memory dicts in source_context and swiss_knife: - Positive (found) entries never expire — verified source/labels are immutable per address. - Negative (miss) entries get a 1-day TTL so a contract verified later, or an address that later gains a label, is picked up. - Only genuine "unverified"/"no label" responses are persisted; transient request failures stay in-memory so an Etherscan/Swiss-Knife blip can't poison the disk cache as a day-long negative. Tests redirect CACHE_DIR to a per-test temp dir (conftest) so caches never litter the repo or leak between tests. Adds unit tests for disk_cache and cross-process-restart persistence tests for both integrations. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get() now touches the file mtime (os.utime) on a successful, non-expired read, so eviction — which sorts by mtime — reflects least-recently-USED rather than oldest-written. Without this, a contract re-read every cron run could be evicted just for having been written early, defeating the cache's purpose for long-pending Safe txs / proposals. TTL stays keyed off the stored JSON write time, not mtime, so refreshing recency on read never extends a negative entry's lifetime. Adds a test that reads "a" before inserting "c" and asserts the unread "b" is the one evicted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The source-context (Etherscan verified source + ABI) and Swiss Knife address-label caches were process-lifetime dicts, so every cron run re-fetched the same immutable data. On GitHub Actions that was unavoidable; on the VPS the cache dir (
/srv/cache) is persistent, so we can reuse it across runs.Concretely: the
multisigprofile runs every 10 min and re-decodes the same pending Safe txs — each decode re-fetched the target contract's source/ABI/labels from Etherscan/Swiss Knife. A pending tx lives for days, so the same immutable source was re-downloaded hundreds of times before the tx ever executed.What
utils/disk_cache.py— a small file-backed JSON cache (one file per key) with per-entry TTL and size-bounded LRU eviction, mirroring the existingselector-cache.txtpattern incalldata/decoder.py. Writes are atomic (temp file +os.replace) so overlappinghourly/multisigprofiles never read a half-written entry. The cache dir resolves underCACHE_DIRlazily so tests can redirect it.source_contextandswiss_knife(no call-site changes):Bonus
Any contract seen once keeps decoding through an upstream outage — turns this from a speed win into a small reliability win too.
Config (env, all optional)
CACHE_NEGATIVE_TTL_SECONDS(default86400— 1 day)SOURCE_CACHE_MAX_ENTRIES(default5000),SOURCE_CACHE_MAX_BYTES(default 256 MiB)LABEL_CACHE_MAX_ENTRIES(default50000)Cache dirs (
source-cache/,label-cache/) live underCACHE_DIRand are gitignored.Tests
tests/conftest.pyredirectsCACHE_DIRto a per-test temp dir so caches never litter the repo or leak between tests.tests/test_disk_cache.py(roundtrip, MISS vs cached-None, TTL expiry, count/byte eviction, corrupt-file resilience).test_source_context.pyandtest_swiss_knife.py, including the "transient errors are not persisted" guarantee.Full suite: 484 passed, 4 skipped.
ruffclean;mypyclean on the changed modules.🤖 Generated with Claude Code