Skip to content

Feat/analytics api hardening#1

Merged
Abdssamie merged 11 commits into
mainfrom
feat/analytics-api-hardening
Apr 21, 2026
Merged

Feat/analytics api hardening#1
Abdssamie merged 11 commits into
mainfrom
feat/analytics-api-hardening

Conversation

@Abdssamie

Copy link
Copy Markdown
Owner

No description provided.

- Convert ingestBatch to pure append-only mutation
- Move visitor, session, and pageview updates to async worker
- Implement background shard compaction for read optimization
- Remove ensureSite overhead from ingestion path
- Add SHA-256 hash cache for writeKey verification
@greptile-apps

greptile-apps Bot commented Apr 20, 2026

Copy link
Copy Markdown

Greptile Summary

This PR hardens the analytics component across three main dimensions: query correctness, write efficiency, and API surface clarity.

Query correctness: The previous getOverview, getTimeseries, and topDimension queries used .take() with fixed limits tied to an old shard-count constant, silently capping results for high-traffic sites. This PR replaces them with a readAll() helper that paginates through all matching rows, and adds buildExactRangePlan for precise raw/hourly/daily range stitching at query edges. Bounce rate and average session duration are now computed from the sessions table rather than rollup shard aggregates.

Write efficiency: updateRollupShards previously issued one DB read+write per dimension per event (O(N × dims) ops). The new accumulateRollupShards / flushRollupShards pattern collects deltas in-memory first, reducing DB ops to O(unique shard keys) per batch — verified by the new ingest.perf.test.ts.

API surface: exposeApi is split into exposeAnalyticsApi (read-only queries) and exposeAdminApi (admin mutations + cleanup). New queries: getTopSources, getTopMediums, getEventPropertyBreakdown. listRawEvents and listSessions now use Convex pagination instead of .take().

Key issues found:

  • registerDefaultAnalyticsCrons defaults runUntilComplete: true, contradicting the README's explicit guidance to keep this unset for normal production cron jobs.
  • Dashboard.tsx computes to with an unrounded Date.now() keyed only on rangeIndex, producing stale data over time and defeating Convex query caching.
  • compactShards action loops until all compactable targets are exhausted with no outer cap, risking Convex action time limits for large backlogs.
  • retryFailedEvents is a public mutation scheduled via the public api reference — inconsistent with the PR's own change of compactShards to internalAction + internal.* scheduling.
  • improvement_plan.md is an internal planning document committed to the public repo.

Confidence Score: 3/5

Safe to merge after fixing runUntilComplete: true default in registerDefaultAnalyticsCrons — that default contradicts the README and could cause unexpected billing in production.

The core improvements (unbounded shard reads, in-batch delta accumulation, pagination, split API surfaces) are well-tested and address real production problems. However, the registerDefaultAnalyticsCrons P1 issue ships a cron helper that contradicts the project's own documented guidance, and the Dashboard's stale/unrounded timestamp is a real UX + performance regression. These issues are targeted and fixable without rearchitecting.

src/client/maintenance.ts (runUntilComplete default), src/react/Dashboard.tsx (stale to timestamp), src/component/compaction.ts (unbounded outer loop)

Important Files Changed

Filename Overview
src/client/maintenance.ts New file providing runCleanupSite, runPruneExpired, and registerDefaultAnalyticsCrons helpers; registerDefaultAnalyticsCrons defaults runUntilComplete: true, contradicting README guidance to keep this unset for normal production crons.
src/react/Dashboard.tsx New React dashboard component with time-range selector; to computed via unrounded Date.now() keyed only on rangeIndex, causing stale data over time and defeating Convex query caching.
src/component/compaction.ts Refactored from single mutation to internalAction + internalMutation + internalQuery; new granular (bucketStart, dimension) per-pair compaction; outer action loop runs until no targets remain with no outer iteration cap.
src/component/analytics.ts Major rewrite: replaced .take() limits with unbounded readAll() paginate loops, added buildExactRangePlan for raw/hourly/daily edge handling, new getTopSources/getTopMediums/getEventPropertyBreakdown queries, and paginated listRawEvents/listSessions.
src/component/ingest.ts Key optimisation: rollup writes now accumulate in-memory deltas map before flushing (O(unique-shard-keys) DB ops vs O(N×dims)); added retryFailedEvents mutation with aggregationAttempts reset; structured retry-delay scheduling for failed events.
src/client/api.ts Adds exposeAnalyticsApi and exposeAdminApi facades splitting read/admin surfaces; new getTopSources, getTopMediums, cleanupSite (action), pruneExpired, retryFailedEvents wrappers with auth; upgrades listRawEvents/listSessions to pagination.
src/component/maintenance.ts Cleanup refactored to action + paginated internalMutation loop; removes pageViews table; adds resolveSiteForCleanup query; budget-bounded delete helpers correctly propagate hasMore.
src/component/schema.ts Removes pageViews table; adds by_siteId_and_sessionId_and_startedAt index to sessions; adds rollupShardCount to site settings; schema is clean.

Sequence Diagram

sequenceDiagram
    participant Browser
    participant HTTP as HTTP Ingest Route
    participant Ingest as ingestBatch (mutation)
    participant Worker as aggregateEventBatch (internalMutation)
    participant DB as Convex DB
    participant Compact as compactShards (internalAction)
    participant Dashboard as Dashboard Queries

    Browser->>HTTP: POST /ingest with writeKey + events
    HTTP->>DB: ensureSite (mutation)
    HTTP->>Ingest: ingestBatch
    Ingest->>DB: insert events (append-only)
    Ingest->>Worker: scheduler.runAfter(0)

    Worker->>DB: ctx.db.get(eventId) x N
    Worker->>Worker: accumulateRollupShards (in-memory)
    Worker->>DB: upsertVisitor / upsertSession
    Worker->>DB: flushRollupShards (unique-key writes)
    Worker->>DB: patch event to done

    Note over Worker: on error: mark failed, schedule retry (5s delay)

    Worker->>Compact: scheduler.runAfter(0) after last batch

    loop while targets exist
        Compact->>DB: listCompactionTargets (internalQuery)
        Compact->>DB: compactShardPairPage (internalMutation)
    end

    Dashboard->>DB: buildExactRangePlan - raw events + hourly + daily rollups
    Dashboard->>DB: querySessionStats (for bounce rate / duration)
    Dashboard-->>Browser: overview / timeseries / top-dimension results
Loading

Comments Outside Diff (1)

  1. src/client/maintenance.ts, line 1150-1152 (link)

    P1 runUntilComplete: true in default crons contradicts README guidance

    registerDefaultAnalyticsCrons hard-codes runUntilComplete: true for the cleanup job. The README explicitly warns:

    "Use runUntilComplete: true only for one-off backfills or after downtime. For normal production cron, keep it unset and let each run delete a bounded batch."

    With runUntilComplete: true, a single cron invocation will loop through sequential ctx.runMutation calls until every expired row in the site is deleted. On a busy deployment this could exhaust Convex action time, generate hundreds of mutation calls per cron tick, and produce unexpected billing. The limit: 100 parameter bounds each mutation, but not the total number of mutations in the action.

    If a runUntilComplete mode is desired for catch-up scenarios, expose it as an option (e.g., options.runUntilComplete) rather than defaulting to true.

Reviews (2): Last reviewed commit: "feat: batch cleanup and configurable sha..." | Re-trigger Greptile

Comment thread src/component/analytics.ts Outdated
Comment thread src/component/ingest.ts
Comment thread src/component/analytics.ts
@Abdssamie Abdssamie merged commit fe296e1 into main Apr 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant