Skip to content

Fix money-correctness & concurrency bugs (#26–#30)#41

Merged
tcconnally merged 1 commit into
mainfrom
fix/money-correctness
Jun 24, 2026
Merged

Fix money-correctness & concurrency bugs (#26–#30)#41
tcconnally merged 1 commit into
mainfrom
fix/money-correctness

Conversation

@tcconnally

Copy link
Copy Markdown
Contributor

This PR resolves 5 critical bugs affecting billing accuracy and concurrency in the Plutus billing engine.

🐛 Bugs Fixed

#26 — Stripe webhook idempotency race condition

Problem: Concurrent Stripe webhook retries could both pass the stripe_event_seen() check and double-credit an account.

Fix:

  • Changed mark_stripe_event to atomically claim events (INSERT OR IGNORE) and return bool (True if newly inserted)
  • Move the claim to the start of handle_webhook_event (before side-effects)
  • Add unmark_stripe_event for rollback if the side-effect raises
  • Prevents double-crediting on concurrent webhook delivery

Tests: test_concurrent_webhook_credits_once, test_mark_stripe_event_atomicity


#27 — /v1/usage batch not atomic

Problem: API endpoint looped over events, committing each individually. If event N was malformed (missing provider), events 1..N-1 already committed. Client's whole-batch retry would double-count those events.

Fix:

  • Validate ALL events before recording ANY (fail-fast with 400 if any malformed)
  • Thread commit=False through record_usage, add_ledger, and log_alert
  • Wrap the entire batch in try: ... conn.commit() except: conn.rollback(); return 400
  • Ensures all-or-nothing semantics per request

Tests: test_malformed_second_event_records_nothing


#28 — Prepaid credit has no hard-stop

Problem: Orgs with prepaid credit could debit indefinitely past zero balance (no enforcement).

Fix:

  • Add block_over_balance config flag (default False to preserve current behavior)
  • Add MeterResult.over_balance field
  • When flag is on AND org has ever had credit (topup/grant ledger row exists) AND balance - cost < 0, reject the event (not recorded)
  • Return 402 with "error": "prepaid credit exhausted" in /v1/usage endpoint

Tests: test_block_over_balance, test_block_over_balance_allows_within_limit, test_over_balance_returns_402


#29 — Checkout session uses client metadata.amount_usd

Problem: _apply_checkout_completed preferred metadata.amount_usd (client-supplied) over amount_total (what Stripe actually collected). Client could manipulate credited amount.

Fix:

  • Prefer obj["amount_total"] / 100.0 (Stripe's collected amount in cents)
  • Fallback to metadata.amount_usd only when amount_total is missing/None
  • Ensures credited amount matches what was actually paid

Tests: test_checkout_prefers_amount_total


#30 — SQLite concurrency: missing busy_timeout

Problem: No PRAGMA busy_timeout meant concurrent writers immediately errored on lock contention instead of waiting.

Fix:

  • Add conn.execute("PRAGMA busy_timeout=5000") in db.connect()
  • Concurrent writers now wait up to 5 seconds for locks
  • Improves reliability under concurrent webhook/API load
  • Added comment in add_ledger noting that balance is authoritative via SUM(delta_usd) and balance_after is a best-effort audit column

✅ Testing

All fixes include comprehensive regression tests:

  • test_engine.py: 41 tests passing (added 6 new)
  • test_server.py: 20 tests passing (added 2 new test classes)
  • No existing tests broken
  • python3 -m compileall -q plutus_agent passes (Python 3.9/3.11/3.12 compatible)

📝 Compatibility

  • ✅ Maintains Python 3.9+ compatibility
  • ✅ No new dependencies (stdlib-only)
  • ✅ Backward-compatible (all new features opt-in via config flags)
  • ✅ Existing integrations unaffected (default commit=True preserved)

Closes #26
Closes #27
Closes #28
Closes #29
Closes #30

This commit resolves 5 critical bugs affecting billing accuracy and concurrency:

#26 — Stripe webhook idempotency race condition
  - Changed mark_stripe_event to atomically claim events and return bool
  - Move claim to start of handle_webhook_event (before side-effects)
  - Add unmark_stripe_event for rollback on side-effect failure
  - Prevents concurrent Stripe retries from double-crediting accounts

#27 — /v1/usage batch not atomic
  - Validate ALL events before recording ANY (fail-fast on malformed events)
  - Thread commit=False through record_usage, add_ledger, log_alert
  - Wrap batch recording in try/commit + except/rollback transaction
  - Prevents partial commits when client retries a failed batch

#28 — Prepaid credit has no hard-stop
  - Add block_over_balance config flag and MeterResult.over_balance field
  - Check org has credit history before blocking (via topup/grant ledger rows)
  - Reject events that would push balance negative when flag is on
  - Return 402 with 'prepaid credit exhausted' error in API endpoint

#29 — Checkout session uses client-supplied metadata.amount_usd
  - Prefer Stripe's amount_total (actual collected amount in cents)
  - Fallback to metadata.amount_usd only when amount_total is missing
  - Prevents client from manipulating credited amount via metadata

#30 — SQLite concurrency: missing busy_timeout
  - Add PRAGMA busy_timeout=5000 in db.connect()
  - Concurrent writers now wait up to 5s instead of immediate error
  - Improves reliability under concurrent webhook/API load

All changes include comprehensive regression tests in test_engine.py and
test_server.py. Existing tests remain green.
@tcconnally tcconnally merged commit 65e3bc0 into main Jun 24, 2026
3 of 4 checks passed
@tcconnally tcconnally deleted the fix/money-correctness branch June 24, 2026 01:48
tcconnally added a commit that referenced this pull request Jun 24, 2026
…ELOG

Both 1.0 launch-gate milestones are merged in code but the version was stuck
at 0.5.1 with no changelog entries for either:

- v0.6 money & concurrency correctness (#26#30, #38) — merged via #41/#44.
- v0.7 security hardening (#31#37) — merged via #42.

Bump version (pyproject + __init__) to 0.7.0 and add [0.6.0] and [0.7.0]
CHANGELOG sections so the version reflects reality. No behavior change; no
launch switches flipped — the roadmap's v0.7 exit (external security review)
remains a separate human gate before public launch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
tcconnally added a commit that referenced this pull request Jun 24, 2026
…ELOG (#45)

Both 1.0 launch-gate milestones are merged in code but the version was stuck
at 0.5.1 with no changelog entries for either:

- v0.6 money & concurrency correctness (#26#30, #38) — merged via #41/#44.
- v0.7 security hardening (#31#37) — merged via #42.

Bump version (pyproject + __init__) to 0.7.0 and add [0.6.0] and [0.7.0]
CHANGELOG sections so the version reflects reality. No behavior change; no
launch switches flipped — the roadmap's v0.7 exit (external security review)
remains a separate human gate before public launch.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment