fix(storage): use active backend in admin cli and mcp#38
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR resolves a critical bug where the admin API, CLI tools, and MCP server bypassed the swappable storage abstraction and queried SQLite directly. All three consumers are refactored to use ChangesStorage Abstraction Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
humane_proxy/storage/redis.py (1)
159-170:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftMake
RedisStore.stats()match the/admin/statspayload contract across backends
humane_proxy/api/admin.pyreturnsget_store().stats()verbatim (plusgenerated_at), buthumane_proxy/storage/redis.py::RedisStore.stats()only providestotal_escalations,by_category, and a hardcodedaverage_risk_score: 0.0, whilehumane_proxy/storage/sqlite.pyandhumane_proxy/storage/postgres.pyalso returnby_day,top_sessions,by_stage, andhourly_last_24h. This creates a backend-dependent response shape for clients/dashboards expecting the enhanced breakdowns.Fix by either computing the missing Redis aggregations or returning explicit empty defaults for
by_day,top_sessions,by_stage, andhourly_last_24h(and keepingaverage_risk_scoreconsistent with the other backends’ type/semantics).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/storage/redis.py` around lines 159 - 170, RedisStore.stats() currently only returns total_escalations, by_category and a hardcoded average_risk_score, causing the /admin/stats payload to differ from SQLite/Postgres backends; update RedisStore.stats (method RedisStore.stats in humane_proxy/storage/redis.py) to include the same keys as the other backends by adding explicit defaults for by_day, top_sessions, by_stage, and hourly_last_24h (e.g., empty lists/dicts) and make average_risk_score match the other backends’ type/semantics (not a hardcoded 0.0 if they use None or computed value) so get_store().stats() returns a consistent shape across backends; keep existing total_escalations and by_category logic using self._client and self._key() and only add the missing fields with appropriate empty/default values or compute them if feasible.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@humane_proxy/api/admin.py`:
- Around line 79-86: The _parse_iso_timestamp function mishandles timezone-aware
datetimes by using .replace(tzinfo=timezone.utc) (which keeps the clock time)
instead of converting to UTC and it also needs exception chaining; update the
parsing path to call datetime.fromisoformat(value) and, if the result is
timezone-aware or naive, convert to UTC with .astimezone(timezone.utc) (or
attach UTC for naive inputs) and return .timestamp(), and when catching
ValueError re-raise the HTTPException with the original exception chained (e.g.,
raise HTTPException(400, f"Invalid {param_name} format: {value}") from err) so
tracebacks are preserved; refer to _parse_iso_timestamp and the ValueError
handler in your change.
In `@humane_proxy/storage/postgres.py`:
- Around line 176-202: The Postgres queries that build by_day and hourly
currently rely on the session TimeZone and must be pinned to UTC to match
SQLite; update the SQL in the by_day comprehension (SELECT
to_timestamp(timestamp)::date ...) to explicitly convert to UTC (e.g. SELECT
(to_timestamp(timestamp) AT TIME ZONE 'UTC')::date AS day ...) and update the
hourly query (to_char(to_timestamp(timestamp), 'HH24') ...) to use UTC as well
(e.g. to_char(to_timestamp(timestamp) AT TIME ZONE 'UTC', 'HH24') AS hour) so
both by_day and hourly use UTC bucketing; keep the same Python variables
(by_day, hourly, cutoff_24h) and parameter usage.
In `@tests/test_admin_api.py`:
- Around line 21-37: The fixture _seeded_db currently returns db_path but
doesn't teardown the cached store; change it to yield db_path and after the
yield call reset_store() (and any necessary cleanup) so the process-wide
singleton from get_store() is cleared on fixture teardown; ensure you still call
reset_store() before initializing the store and then call it again in the
teardown to avoid a stale _store pointing at the temporary DB.
---
Outside diff comments:
In `@humane_proxy/storage/redis.py`:
- Around line 159-170: RedisStore.stats() currently only returns
total_escalations, by_category and a hardcoded average_risk_score, causing the
/admin/stats payload to differ from SQLite/Postgres backends; update
RedisStore.stats (method RedisStore.stats in humane_proxy/storage/redis.py) to
include the same keys as the other backends by adding explicit defaults for
by_day, top_sessions, by_stage, and hourly_last_24h (e.g., empty lists/dicts)
and make average_risk_score match the other backends’ type/semantics (not a
hardcoded 0.0 if they use None or computed value) so get_store().stats() returns
a consistent shape across backends; keep existing total_escalations and
by_category logic using self._client and self._key() and only add the missing
fields with appropriate empty/default values or compute them if feasible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fbe9e864-4ab3-4d0f-9944-88062dbff3ea
📒 Files selected for processing (7)
humane_proxy/api/admin.pyhumane_proxy/cli.pyhumane_proxy/mcp_server.pyhumane_proxy/storage/postgres.pyhumane_proxy/storage/redis.pyhumane_proxy/storage/sqlite.pytests/test_admin_api.py
| def _parse_iso_timestamp(value: str | None, param_name: str) -> float | None: | ||
| if not value: | ||
| return None | ||
| try: | ||
| rec["triggers"] = json.loads(rec["triggers"]) | ||
| except Exception: | ||
| pass | ||
| return rec | ||
|
|
||
|
|
||
| def _get_conn() -> sqlite3.Connection: | ||
| return sqlite3.connect(_get_db_path(), check_same_thread=False) | ||
| return datetime.fromisoformat(value).replace(tzinfo=timezone.utc).timestamp() | ||
| except ValueError: | ||
| raise HTTPException(400, f"Invalid {param_name} format: {value}") | ||
|
|
There was a problem hiding this comment.
Timezone handling bug when input includes explicit timezone.
Using .replace(tzinfo=timezone.utc) on a timezone-aware datetime replaces the timezone metadata without converting the time. For example, "2026-01-01T12:00:00+05:00" should convert to 07:00 UTC, but this code produces 12:00 UTC.
Additionally, per static analysis: chain the exception with from err or from None to preserve traceback context.
🐛 Proposed fix
def _parse_iso_timestamp(value: str | None, param_name: str) -> float | None:
if not value:
return None
try:
- return datetime.fromisoformat(value).replace(tzinfo=timezone.utc).timestamp()
- except ValueError:
- raise HTTPException(400, f"Invalid {param_name} format: {value}")
+ dt = datetime.fromisoformat(value)
+ if dt.tzinfo is None:
+ dt = dt.replace(tzinfo=timezone.utc)
+ return dt.timestamp()
+ except ValueError as err:
+ raise HTTPException(400, f"Invalid {param_name} format: {value}") from err📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _parse_iso_timestamp(value: str | None, param_name: str) -> float | None: | |
| if not value: | |
| return None | |
| try: | |
| rec["triggers"] = json.loads(rec["triggers"]) | |
| except Exception: | |
| pass | |
| return rec | |
| def _get_conn() -> sqlite3.Connection: | |
| return sqlite3.connect(_get_db_path(), check_same_thread=False) | |
| return datetime.fromisoformat(value).replace(tzinfo=timezone.utc).timestamp() | |
| except ValueError: | |
| raise HTTPException(400, f"Invalid {param_name} format: {value}") | |
| def _parse_iso_timestamp(value: str | None, param_name: str) -> float | None: | |
| if not value: | |
| return None | |
| try: | |
| dt = datetime.fromisoformat(value) | |
| if dt.tzinfo is None: | |
| dt = dt.replace(tzinfo=timezone.utc) | |
| return dt.timestamp() | |
| except ValueError as err: | |
| raise HTTPException(400, f"Invalid {param_name} format: {value}") from err |
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 85-85: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/api/admin.py` around lines 79 - 86, The _parse_iso_timestamp
function mishandles timezone-aware datetimes by using
.replace(tzinfo=timezone.utc) (which keeps the clock time) instead of converting
to UTC and it also needs exception chaining; update the parsing path to call
datetime.fromisoformat(value) and, if the result is timezone-aware or naive,
convert to UTC with .astimezone(timezone.utc) (or attach UTC for naive inputs)
and return .timestamp(), and when catching ValueError re-raise the HTTPException
with the original exception chained (e.g., raise HTTPException(400, f"Invalid
{param_name} format: {value}") from err) so tracebacks are preserved; refer to
_parse_iso_timestamp and the ValueError handler in your change.
| by_day = { | ||
| r["day"].isoformat() if hasattr(r["day"], "isoformat") else str(r["day"]): r["cnt"] | ||
| for r in conn.execute( | ||
| """SELECT to_timestamp(timestamp)::date as day, COUNT(*) as cnt | ||
| FROM escalations GROUP BY day ORDER BY day DESC LIMIT 30""" | ||
| ).fetchall() | ||
| } | ||
| top_sessions = conn.execute( | ||
| """SELECT session_id, COUNT(*) as cnt, AVG(risk_score) as avg_score | ||
| FROM escalations GROUP BY session_id ORDER BY cnt DESC LIMIT 10""" | ||
| ).fetchall() | ||
| by_stage = { | ||
| r["stage_reached"]: r["cnt"] | ||
| for r in conn.execute( | ||
| "SELECT stage_reached, COUNT(*) as cnt FROM escalations GROUP BY stage_reached" | ||
| ).fetchall() | ||
| } | ||
| cutoff_24h = datetime.now(timezone.utc).timestamp() - 86400 | ||
| hourly = { | ||
| str(r["hour"]): r["cnt"] | ||
| for r in conn.execute( | ||
| """SELECT to_char(to_timestamp(timestamp), 'HH24') as hour, COUNT(*) as cnt | ||
| FROM escalations WHERE timestamp >= %s | ||
| GROUP BY hour ORDER BY hour""", | ||
| (cutoff_24h,), | ||
| ).fetchall() | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# List likely storage files
ls -R
# Find stats() implementation and /admin/stats wiring
rg -n "def stats\(|/admin/stats|admin/stats|stats\(" humane_proxy || true
# Inspect Postgres stats file around the cited lines
sed -n '1,260p' humane_proxy/storage/postgres.py | nl -ba | sed -n '140,240p' || true
# Locate SQLite stats implementation and look for unixepoch / UTC handling
fd -t f "sqlite*.py" humane_proxy/storage || true
rg -n "unixepoch|strftime|AT TIME ZONE|timezone" humane_proxy/storage || true
# Find any explicit "SET TIME ZONE" / timezone pinning in the codebase
rg -n "SET TIME ZONE|TimeZone.*UTC|timezone.*UTC" humane_proxy || true
# Identify the Postgres table schema / column type for escalations.timestamp
rg -n "CREATE TABLE.*escalations|escalations.*timestamp" humane_proxy || trueRepository: Vishisht16/Humane-Proxy
Length of output: 4770
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '40,110p' humane_proxy/storage/postgres.py | sed -n '1,120p'
echo "----"
sed -n '140,260p' humane_proxy/storage/postgres.py | sed -n '1,200p'
echo "==== SQLITE ===="
sed -n '40,120p' humane_proxy/storage/sqlite.py | sed -n '1,120p'
echo "----"
sed -n '160,280p' humane_proxy/storage/sqlite.py | sed -n '1,200p'
echo "==== TESTS ===="
rg -n "by_day|hourly|hourly_last_24h|/admin/stats|get_stats|stats\(" tests humane_proxy/api/admin.py humane_proxy/storage/sqlite.py humane_proxy/storage/postgres.py || trueRepository: Vishisht16/Humane-Proxy
Length of output: 15421
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' tests/test_storage_backends.py | nl -ba
echo "----"
sed -n '1,160p' tests/test_admin_api.py | nl -ba
echo "----"
rg -n "by_day|hourly_last_24h|stats\(" tests -S || trueRepository: Vishisht16/Humane-Proxy
Length of output: 110
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "==== tests/test_storage_backends.py ===="
cat -n tests/test_storage_backends.py | sed -n '1,220p'
echo "----"
echo "==== tests/test_admin_api.py ===="
cat -n tests/test_admin_api.py | sed -n '1,220p'
echo "----"
echo "==== any assertions on by_day/hourly_last_24h ===="
rg -n "by_day|hourly_last_24h|to_timestamp|to_char|unixepoch|strftime\('%H'" tests humane_proxy/storage || trueRepository: Vishisht16/Humane-Proxy
Length of output: 10050
Pin by_day / hourly_last_24h bucketing to UTC for consistent /admin/stats across backends.
escalations.timestamp is stored as UTC epoch seconds (both SQLite and Postgres), but the Postgres by_day/hourly_last_24h queries bucket using to_timestamp(...)::date and to_char(..., 'HH24'), which depend on the PostgreSQL session TimeZone. SQLite uses 'unixepoch' (UTC), so boundaries can shift when TimeZone isn’t UTC.
🕐 Proposed UTC-pinned queries
- """SELECT to_timestamp(timestamp)::date as day, COUNT(*) as cnt
+ """SELECT (to_timestamp(timestamp) AT TIME ZONE 'UTC')::date as day, COUNT(*) as cnt
FROM escalations GROUP BY day ORDER BY day DESC LIMIT 30"""- """SELECT to_char(to_timestamp(timestamp), 'HH24') as hour, COUNT(*) as cnt
+ """SELECT to_char(to_timestamp(timestamp) AT TIME ZONE 'UTC', 'HH24') as hour, COUNT(*) as cnt
FROM escalations WHERE timestamp >= %s
GROUP BY hour ORDER BY hour""",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/storage/postgres.py` around lines 176 - 202, The Postgres
queries that build by_day and hourly currently rely on the session TimeZone and
must be pinned to UTC to match SQLite; update the SQL in the by_day
comprehension (SELECT to_timestamp(timestamp)::date ...) to explicitly convert
to UTC (e.g. SELECT (to_timestamp(timestamp) AT TIME ZONE 'UTC')::date AS day
...) and update the hourly query (to_char(to_timestamp(timestamp), 'HH24') ...)
to use UTC as well (e.g. to_char(to_timestamp(timestamp) AT TIME ZONE 'UTC',
'HH24') AS hour) so both by_day and hourly use UTC bucketing; keep the same
Python variables (by_day, hourly, cutoff_24h) and parameter usage.
| @pytest.fixture() | ||
| def _seeded_db(tmp_path, monkeypatch): | ||
| """Create a temp DB with 3 escalation rows.""" | ||
| """Create a temp store with 3 escalation rows.""" | ||
| db_path = tmp_path / "test_admin.db" | ||
| import sqlite3 | ||
| conn = sqlite3.connect(str(db_path)) | ||
| with conn: | ||
| conn.execute( | ||
| """CREATE TABLE escalations ( | ||
| id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
| session_id TEXT, category TEXT, risk_score REAL, | ||
| triggers TEXT, timestamp REAL, | ||
| message_hash TEXT, stage_reached INTEGER, reasoning TEXT | ||
| )""" | ||
| ) | ||
| rows = [ | ||
| ("sess-1", "self_harm", 1.0, '["keyword:kill myself"]', time.time(), None, 1, None), | ||
| ("sess-2", "criminal_intent", 0.75, '["keyword:how to make a bomb"]', time.time(), None, 1, None), | ||
| ("sess-1", "self_harm", 1.0, '["pattern:self_annihilation"]', time.time(), None, 2, "LLM reasoning"), | ||
| ] | ||
| conn.executemany( | ||
| "INSERT INTO escalations (session_id, category, risk_score, triggers, timestamp, message_hash, stage_reached, reasoning) VALUES (?,?,?,?,?,?,?,?)", | ||
| rows, | ||
| ) | ||
| monkeypatch.setattr("humane_proxy.api.admin._get_db_path", lambda: str(db_path)) | ||
| monkeypatch.setenv("HUMANE_PROXY_DB_PATH", str(db_path)) | ||
|
|
||
| from humane_proxy.config import reload_config | ||
| from humane_proxy.storage.factory import get_store, reset_store | ||
|
|
||
| reload_config() | ||
| reset_store() | ||
| store = get_store() | ||
| store.init() | ||
| store.log("sess-1", "self_harm", 1.0, ["keyword:kill myself"]) | ||
| store.log("sess-2", "criminal_intent", 0.75, ["keyword:how to make a bomb"]) | ||
| store.log("sess-1", "self_harm", 1.0, ["pattern:self_annihilation"], stage_reached=2, reasoning="LLM reasoning") | ||
| return db_path |
There was a problem hiding this comment.
Add fixture teardown to reset the cached store.
get_store() returns a process-wide cached singleton. The fixture calls reset_store() on entry, but there is no teardown: after the test, monkeypatch restores HUMANE_PROXY_DB_PATH, yet the cached _store still points at the now-deleted tmp_path database. Any later test (especially in another module) that calls get_store() without first resetting will inherit this stale store, which can cause order-dependent, flaky failures. Yield and reset on teardown.
🧹 Proposed teardown
reload_config()
reset_store()
store = get_store()
store.init()
store.log("sess-1", "self_harm", 1.0, ["keyword:kill myself"])
store.log("sess-2", "criminal_intent", 0.75, ["keyword:how to make a bomb"])
store.log("sess-1", "self_harm", 1.0, ["pattern:self_annihilation"], stage_reached=2, reasoning="LLM reasoning")
- return db_path
+ yield db_path
+ reset_store()
+ reload_config()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_admin_api.py` around lines 21 - 37, The fixture _seeded_db
currently returns db_path but doesn't teardown the cached store; change it to
yield db_path and after the yield call reset_store() (and any necessary cleanup)
so the process-wide singleton from get_store() is cleared on fixture teardown;
ensure you still call reset_store() before initializing the store and then call
it again in the teardown to avoid a stale _store pointing at the temporary DB.
|
@aarushlohit I don't accept stray PRs without verified issue assignment |
Description
This PR fixes an architectural inconsistency where the Admin API, CLI tools (
hp escalations,hp session), and MCP server queries bypassed the configured storage backend and accessed SQLite directly usingsqlite3.connect().Changes Made
get_store()).Motivation
The project supports multiple storage backends through a swappable storage architecture. However, several admin and tooling paths bypassed this abstraction and queried SQLite directly, causing inconsistent results whenever Redis or PostgreSQL was configured.
This change restores backend consistency and ensures all interfaces (Admin API, CLI, MCP) reflect data stored in the active backend.
Fixes #37
Type of Change
Checklist
pytest tests/ -v)