Skip to content

feat(cache): persist Etherscan source/ABI and Swiss Knife labels to disk#268

Draft
spalen0 wants to merge 2 commits into
mainfrom
llm-cache
Draft

feat(cache): persist Etherscan source/ABI and Swiss Knife labels to disk#268
spalen0 wants to merge 2 commits into
mainfrom
llm-cache

Conversation

@spalen0

@spalen0 spalen0 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Why

The source-context (Etherscan verified source + ABI) and Swiss Knife address-label caches were process-lifetime dicts, so every cron run re-fetched the same immutable data. On GitHub Actions that was unavoidable; on the VPS the cache dir (/srv/cache) is persistent, so we can reuse it across runs.

Concretely: the multisig profile runs every 10 min and re-decodes the same pending Safe txs — each decode re-fetched the target contract's source/ABI/labels from Etherscan/Swiss Knife. A pending tx lives for days, so the same immutable source was re-downloaded hundreds of times before the tx ever executed.

What

  • New utils/disk_cache.py — a small file-backed JSON cache (one file per key) with per-entry TTL and size-bounded LRU eviction, mirroring the existing selector-cache.txt pattern in calldata/decoder.py. Writes are atomic (temp file + os.replace) so overlapping hourly/multisig profiles never read a half-written entry. The cache dir resolves under CACHE_DIR lazily so tests can redirect it.
  • Wired under the existing in-memory dicts in source_context and swiss_knife (no call-site changes):
    • Positive (found) entries never expire — verified source and curated labels are immutable per address.
    • Negative (miss) entries get a 1-day TTL so a contract verified later, or an address that later gains a label, is picked up.
    • Only genuine "unverified" / "no label" responses are persisted; transient request failures stay in-memory only, so an Etherscan/Swiss-Knife blip can't poison the disk cache as a day-long negative.

Bonus

Any contract seen once keeps decoding through an upstream outage — turns this from a speed win into a small reliability win too.

Config (env, all optional)

  • CACHE_NEGATIVE_TTL_SECONDS (default 86400 — 1 day)
  • SOURCE_CACHE_MAX_ENTRIES (default 5000), SOURCE_CACHE_MAX_BYTES (default 256 MiB)
  • LABEL_CACHE_MAX_ENTRIES (default 50000)

Cache dirs (source-cache/, label-cache/) live under CACHE_DIR and are gitignored.

Tests

  • tests/conftest.py redirects CACHE_DIR to a per-test temp dir so caches never litter the repo or leak between tests.
  • New tests/test_disk_cache.py (roundtrip, MISS vs cached-None, TTL expiry, count/byte eviction, corrupt-file resilience).
  • Cross-process-restart persistence tests added to test_source_context.py and test_swiss_knife.py, including the "transient errors are not persisted" guarantee.

Full suite: 484 passed, 4 skipped. ruff clean; mypy clean on the changed modules.

🤖 Generated with Claude Code

codex and others added 2 commits June 9, 2026 09:27
The source-context (Etherscan verified source + ABI) and Swiss Knife label
caches were process-lifetime dicts, so every cron run re-fetched the same
immutable data. On GitHub Actions that was unavoidable; on the VPS the cache
dir (/srv/cache) is persistent, so we can reuse it across runs.

Add utils/disk_cache.py: a small file-backed JSON cache (one file per key)
with per-entry TTL and size-bounded LRU eviction, mirroring the existing
selector-cache pattern in calldata/decoder.py. Writes are atomic (temp +
os.replace) so overlapping hourly/multisig profiles can't read a half-written
entry. Resolved under CACHE_DIR lazily so tests can redirect it.

Wire it under the existing in-memory dicts in source_context and swiss_knife:
- Positive (found) entries never expire — verified source/labels are immutable
  per address.
- Negative (miss) entries get a 1-day TTL so a contract verified later, or an
  address that later gains a label, is picked up.
- Only genuine "unverified"/"no label" responses are persisted; transient
  request failures stay in-memory so an Etherscan/Swiss-Knife blip can't poison
  the disk cache as a day-long negative.

Tests redirect CACHE_DIR to a per-test temp dir (conftest) so caches never
litter the repo or leak between tests. Adds unit tests for disk_cache and
cross-process-restart persistence tests for both integrations.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
get() now touches the file mtime (os.utime) on a successful, non-expired
read, so eviction — which sorts by mtime — reflects least-recently-USED
rather than oldest-written. Without this, a contract re-read every cron run
could be evicted just for having been written early, defeating the cache's
purpose for long-pending Safe txs / proposals.

TTL stays keyed off the stored JSON write time, not mtime, so refreshing
recency on read never extends a negative entry's lifetime.

Adds a test that reads "a" before inserting "c" and asserts the unread "b"
is the one evicted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant