Skip to content

Switch evmlog source from RPC getLogs to envio Postgres #402

@matheus1lva

Description

@matheus1lva

Envio-side prerequisites tracked in chain-events/yearn-indexing-test#24transactionIndex column, chains 100/146/80094 enabled, per-event coverage audit. Must land before EVMLOG_SOURCE=envio rollout.

Description

Kong currently fetches event logs by calling viem getLogs() against configured RPCs (packages/ingest/extract/evmlogs.ts:41). Direct RPC calls have been observed to drop logs, leaving holes in the evmlog table. The yearn internal envio indexer already maintains a complete, deduped event stream in Postgres. This spec replaces the RPC fetch with a direct pull from the envio DB, gated by an env var so RPC remains a fallback, and adds a maintenance script to backfill historical gaps from envio into kong.

Context

Current implementation

  • packages/ingest/extract/evmlogs.ts:37-48EvmLogsExtractor.extract() calls rpcs.next(chainId, from).getLogs({ address, events, fromBlock, toBlock }) to fetch logs.
  • Result feeds hooks (packages/ingest/extract/evmlogs.ts:50-76) and is queued to mq.job.load.evmlog (:90-95).
  • packages/ingest/load/index.ts:55-87 upserts each row into evmlog and updates evmlog_strides.
  • evmlog schema: packages/db/migrations/sqls/20240214020032-eventsource-up.sql:1-15. PK (chain_id, address, signature, block_number, log_index, transaction_hash). transaction_index int4 NOT NULL. topics text[] NOT NULL. block_time timestamptz.

Target system

  • Envio replica DB at postgresql://envio_ix_7f3a:…@dpg-d75faobuibrs73bmodug-b.replica-cyan.virginia-postgres.render.com/envio_indexer_jvm1.
  • Schema envio holds typed per-event tables (one table per event, e.g. envio.\"Transfer\", envio.\"StrategyReported\", envio.\"V3RegistryNewEndorsedVault\", …).
  • Common columns: chainId, vaultAddress (or registry/factory address), blockNumber, blockTimestamp (unix int), blockHash, transactionHash, transactionFrom, logIndex, plus event-specific args.
  • Missing vs kong evmlog requirements:
    • No transactionIndex column (any event table inspected so far)
    • No topics[] array (only event_name + typed args)
    • No signature (topic0); must be derived from event ABI
    • blockTimestamp is unix int, kong needs timestamptz
  • Envio's transient envio.raw_events table has the right shape but is empty in production (rotated/cleared).
  • Envio indexer source: ~/Desktop/yearn/yearn-indexing-test/apps/indexer (config at apps/indexer/config.yaml, handlers at src/EventHandlers.ts).

Coverage

  • Envio chains observed in envio.\"Transfer\": 1, 10, 137, 250, 8453, 42161, 747474. Kong also indexes 100 (gnosis), 146 (sonic), 80094 (bera) — must verify per chain whether envio is configured.
  • Envio events (per envio schema): yearn V2/V3 vault, ERC4626 Deposit/Withdraw, V2 Registry/Registry2, V3 Registry/RoleManager/RoleManagerFactory/VaultFactory/Splitter/YieldSplitter/Accountant, VeyfiGaugeRegistered, VotingEscrowCreated, StakingPoolAdded, plus all V3 vault role/config events.
  • Kong ABIs (sample, full audit needed against config/abis.yaml): erc4626, yearn/2/vault, yearn/2/strategy, yearn/2/registry, yearn/2/registry2, yearn/3/vault, yearn/3/strategy, yearn/3/registry, yearn/3/registry2, yearn/3/registry3, yearn/3/vaultFactory, yearn/3/roleManager, yearn/3/roleManagerFactory, yearn/3/debtManagerFactory, yearn/governance/votingEscrow, yearn/staking/registry/{juiced,v3,opboost,veyfi}, yearn/staking/pool. Each has events under packages/ingest/abis/<path>/event/.../hook.ts.

Tasks

1. Envio-side prerequisites (~/Desktop/yearn/yearn-indexing-test/apps/indexer)

  • Add transactionIndex to field_selection.transaction_fields in apps/indexer/config.yaml so envio captures it.
  • Update src/EventHandlers.ts to persist transactionIndex on every event entity write.
  • Update schema.graphql to add transactionIndex: Int! on every event type.
  • Run envio migration to add the column to all per-event tables (or accept that a full reindex is needed — confirm with envio team).
  • Audit config.yaml against kong's config/abis.yaml:
    • Add chains kong indexes that envio doesn't (100, 146, 80094 — verify).
    • Add any kong events not yet handled by envio. The PR description must list each missing (abiPath, event_name) tuple and the envio config + handler additions needed.
  • Document in this spec PR which events stay RPC-only (if any) until envio catches up.

2. Envio DB client in kong (packages/ingest)

  • Add envio DB connection: env vars ENVIO_DATABASE_URL (full URL, recommended) or split ENVIO_PG_HOST/PORT/USER/PASSWORD/DATABASE/SSL. Validate via zod in packages/lib/types.ts or co-located.
  • New file packages/ingest/extract/envio.ts:
    • Export pool: pg.Pool for the envio replica (read-only — set default_transaction_read_only=on).
    • Export fetchEnvioLogs(chainId, address, from, to, abiPath) returning rows shaped like viem's Log (so the existing extract() flow needs no downstream changes).
    • Build a static map abiPath → { event_name → { table: 'envio.\"Transfer\"', signature: '0x…', topics: ['0x…'], paramsToArgs(row): {...} } }. Keep the map alongside each ABI under packages/ingest/abis/<path>/envio.ts so it ships with the ABI definition.
    • Per call, run UNION ALL across the relevant per-event tables filtered by chainId, vaultAddress=\$, blockNumber BETWEEN \$ AND \$.
    • Hydrate each row into the kong log shape: address checksummed via getAddress(), topics: [signature], args from typed envio columns, blockNumber: bigint, blockTime via to_timestamp(blockTimestamp), transactionIndex from envio (after envio change above).

3. Source switch with env-var gating

  • Add to env validation: EVMLOG_SOURCE=rpc|envio (default rpc) and per-chain override EVMLOG_SOURCE_<chainId>=rpc|envio.
  • Add helper evmlogSource(chainId): 'rpc' | 'envio' resolving global default + per-chain override.
  • In packages/ingest/extract/evmlogs.ts:37-48, branch on evmlogSource(chainId):
    • rpc → existing rpcs.next(...).getLogs(...).
    • enviofetchEnvioLogs(chainId, address, from, to, abiPath).
    • For events not in the abi's envio map, fall back to RPC for that event only (so rollout is safe). Log a warning with (chainId, abiPath, event_name) so coverage gaps are observable.
  • Hooks pipeline (evmlogs.ts:50-76) and load queue stay unchanged.

4. Maintenance/backfill script (packages/ingest/scripts/envio-sync.ts)

  • Bun CLI taking --chain <id>, --abi-path <path> (optional, defaults to all), --address <0x…> (optional), --from <block>, --to <block> (defaults to envio's max block per chain), --dry-run.
  • For each (chain, address, abiPath, event_name) covered by envio:
    • Select rows from envio tables in the range.
    • Left-anti-join against kong evmlog on (chain_id, address, signature, block_number, log_index, transaction_hash).
    • For missing rows: hydrate into log shape, run the same hooks as EvmLogsExtractor.extract() would (re-use requireHooks + topic filter + hook.module.default), and upsertEvmLog per chunk (batch ~500 rows).
    • Update evmlog_strides via strider.add for the swept range so subsequent fanouts skip.
  • Emit a summary per (chain, address, abiPath): envio_count, kong_count, inserted, skipped.
  • Idempotent — safe to re-run.
  • Add to package.json of packages/ingest: \"envio-sync\": \"bun run scripts/envio-sync.ts\".

5. Tests

  • packages/ingest/extract/envio.spec.ts — mock pg pool, verify fetchEnvioLogs returns log shapes equivalent to viem getLogs (validated via EvmLogSchema).
  • Source-switch test in evmlogs.spec.ts — assert evmlogSource honors global and per-chain overrides.
  • Backfill script test with a fixture of envio rows + kong rows that establish a known gap; verify only missing rows insert.

6. Documentation

  • Update CLAUDE.md (or docs/) with the new env vars and rollout playbook (start EVMLOG_SOURCE=rpc, flip per chain after envio coverage validated).
  • Add a section to the spec PR description listing every kong event NOT covered by envio at merge time, with the corresponding envio-side change required.

Acceptance Criteria

  • EVMLOG_SOURCE=envio causes EvmLogsExtractor.extract() to read from envio Postgres for every event covered by the per-ABI envio map; falls back to RPC for uncovered events with a warning.
  • EVMLOG_SOURCE_<chainId> overrides the global default for that chain only.
  • Default behavior with neither env var set is unchanged (RPC).
  • Per-event hydration produces rows that pass EvmLogSchema.array().parse(...) and upsert into evmlog without PK conflicts on re-run.
  • Envio config + handlers updated to persist transactionIndex; kong rows sourced from envio carry the real transaction_index value.
  • Envio config covers all chains kong indexes (or the spec PR documents the remaining gap and the events still routed via RPC).
  • `bun --filter ingest run envio-sync --chain 1 --abi-path yearn/3/vault` inserts envio rows missing from kong's `evmlog`, no rows are deleted, and `evmlog_strides` is extended for swept ranges.
  • Re-running the backfill against the same range inserts zero new rows.
  • Tests in §5 pass (`bun --filter ingest test`).
  • PR description enumerates every kong (abiPath, event_name) NOT yet in envio along with the envio-side change required.

Technical Notes

  • Read-only: envio DB connection MUST be read-only (`default_transaction_read_only=on` on the connection) — that DB is owned by the envio team.
  • Connection target: the URL above is a Render replica; confirm with the envio team that it's the right read endpoint and that it's safe to load it with the maintenance script (consider concurrency caps).
  • Dedup PK: kong's `evmlog` PK includes `signature, block_number, log_index, transaction_hash`. Envio per-event tables have no `signature` column — kong constructs it from the event name via the ABI map. If two ABIs share an event name with the same selector (e.g. `Transfer`), the address-scoped lookup avoids cross-contamination because hydration is per-(abiPath, address).
  • Address case: envio stores addresses as plain text; kong canonicalizes via `getAddress()`. Compare lowercased on join, store checksummed on insert.
  • Block time: convert envio's int `blockTimestamp` to `timestamptz` via `to_timestamp(blockTimestamp)` before insert.
  • Hook re-execution in backfill: hooks call onchain via multicall; backfilling large historical ranges will hammer archive RPCs. The script should chunk by block range and respect `RPC_BATCH_SIZE`.
  • No schema migration on kong's `evmlog`: per the agreed approach, `transaction_index` stays NOT NULL — the envio side is the one that changes to provide it.
  • Strider semantics: only update `evmlog_strides` for ranges where envio is a complete source for that address. If only some events of an ABI are envio-covered and the rest still RPC, do NOT extend strides from the backfill (would skip the RPC-only events on next fanout).
  • Rollout: stage with `EVMLOG_SOURCE_1=envio` first, observe parity for ~24h, then expand chain by chain.
  • Secrets: don't commit the envio DSN. Add to `.env.example` as `ENVIO_DATABASE_URL=` placeholder only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions