Summary
HumaneProxy's Risk Trajectory & Time-Decay engine computes a per-session
rolling window of risk scores, detects spikes, and applies a +0.25 score
boost when a sudden elevation is found. This is a powerful safety feature —
but it contains a critical design gap: session_id is a caller-supplied,
unvalidated string.
There is no mechanism in the pipeline that binds a session_id to an
authenticated identity. This means:
- An adversarial user can deliberately supply another user's
session_id
to inject artificially low scores into that user's trajectory window,
diluting their accumulated risk and suppressing future spike detection.
- Conversely, injecting high scores into a victim's window can trigger
false-positive spike boosts and cause legitimate messages to be blocked
and operators to be flooded with false escalation alerts.
- In a multi-tenant deployment (the norm for any SaaS using this as
middleware), these two attacks require zero authentication — just
knowledge of another user's session_id, which is often a predictable
value like a database row ID, username, or email hash.
Attack Scenarios
Scenario A — Risk Dilution (Bypass)
An attacker who wants to avoid detection pre-poisons their own session window
with a series of benign, low-score messages under their real session_id.
The time-decay-weighted mean is pulled down to near zero. When they finally
send a harmful message, the delta is high — but they have already padded the
window enough that the weighted_mean absorbs the spike, and the final score
lands just below the escalate_threshold.
# Attacker sends 4 innocent messages → window mean ≈ 0.05
check("The weather is nice today", session_id="attacker-42") # 0.02
check("Can you help me with Python?", session_id="attacker-42") # 0.03
check("What are some good recipes?", session_id="attacker-42") # 0.04
check("Tell me about Paris", session_id="attacker-42") # 0.03
# Now the harmful message:
check("How do I obtain prescription opioids without a doctor?",
session_id="attacker-42") # raw score: 0.72
# delta = 0.72 − 0.03 = 0.69 → spike detected, boost applied → 0.72 + 0.25 = 0.97
# BUT: if they also interleaved medium-risk messages first to raise the mean:
# weighted_mean ≈ 0.50, delta = 0.72 − 0.50 = 0.22 < spike_delta (0.35)
# → NO spike boost → final score = 0.72 → just above threshold (if threshold = 0.7)
# Fine-tuning the mean is trivially achievable.
Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)
An attacker who knows victim's session_id (e.g., it's their username) sends
high-score messages under the victim's session. The victim's next message —
however innocent — may be evaluated against an inflated mean, generating a
false spike, a bogus escalation alert, and potentially a care-response block
sent to an unsuspecting user.
# Attacker poisons victim's session
check("I hate everything and want to die", session_id="victim-user-99") # score ≈ 0.95
check("Nobody cares if I disappear", session_id="victim-user-99") # score ≈ 0.85
# Victim sends an innocent next message:
check("Can you recommend a good movie?", session_id="victim-user-99") # raw: 0.05
# weighted_mean from poisoned window ≈ 0.90
# delta = 0.05 − 0.90 = −0.85 → no spike (drop, not rise)
# BUT window is now poisoned for the NEXT message the victim sends,
# meaning the baseline is permanently elevated until decay kicks in (24h default).
Root Cause
The session_id field accepted by check(), check_async(), and the
/v1/check HTTP endpoint is purely caller-supplied with no binding to an
authenticated identity. The trajectory store (SQLite/Redis/Postgres) uses it
as a raw key:
get_session_risk(session_id) — fetches any session by its string key
list_recent_escalations(session_id=...) — filters by raw string
DELETE /admin/sessions/{id} — deletes by raw string
There is no concept of a session owner — the session namespace is globally
flat and writable by any caller.
Affected Components
| Component |
Role in Bug |
humane_proxy/pipeline.py (or equivalent) |
check() / check_async() accept session_id without ownership validation |
humane_proxy/trajectory.py (or equivalent) |
Rolling window read/write uses raw session_id as the store key |
humane_proxy/storage/ (sqlite/redis/postgres backends) |
No per-session ownership column or ACL |
REST proxy endpoint POST /v1/chat/completions |
Forwards session_id from request body without verification |
MCP tool check_message_safety |
Exposes session_id parameter to any connected AI agent |
Proposed Fix
1. Bind session ownership at creation time.
When a session_id is first written to the store, record a session_owner
token (e.g., a hash of the API key or client IP + secret). On all subsequent
writes to that session, verify the token matches:
def _assert_session_owner(self, session_id: str, owner_token: str):
existing_owner = self._store.get_session_owner(session_id)
if existing_owner is None:
self._store.set_session_owner(session_id, owner_token)
elif existing_owner != owner_token:
raise SessionOwnershipError(
f"session_id '{session_id}' belongs to a different caller"
)
2. Add a session_owner column to all storage backends.
ALTER TABLE sessions ADD COLUMN owner_token TEXT NOT NULL DEFAULT '';
3. Document the threat in SECURITY.md and the configuration reference.
Until a full fix ships, operators should be warned to treat session_id as a
sensitive value and avoid using predictable identifiers (usernames, emails,
sequential IDs).
Summary
HumaneProxy's Risk Trajectory & Time-Decay engine computes a per-session
rolling window of risk scores, detects spikes, and applies a
+0.25scoreboost when a sudden elevation is found. This is a powerful safety feature —
but it contains a critical design gap:
session_idis a caller-supplied,unvalidated string.
There is no mechanism in the pipeline that binds a
session_idto anauthenticated identity. This means:
session_idto inject artificially low scores into that user's trajectory window,
diluting their accumulated risk and suppressing future spike detection.
false-positive spike boosts and cause legitimate messages to be blocked
and operators to be flooded with false escalation alerts.
middleware), these two attacks require zero authentication — just
knowledge of another user's
session_id, which is often a predictablevalue like a database row ID, username, or email hash.
Attack Scenarios
Scenario A — Risk Dilution (Bypass)
An attacker who wants to avoid detection pre-poisons their own session window
with a series of benign, low-score messages under their real
session_id.The time-decay-weighted mean is pulled down to near zero. When they finally
send a harmful message, the delta is high — but they have already padded the
window enough that the
weighted_meanabsorbs the spike, and the final scorelands just below the
escalate_threshold.Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)
An attacker who knows victim's
session_id(e.g., it's their username) sendshigh-score messages under the victim's session. The victim's next message —
however innocent — may be evaluated against an inflated mean, generating a
false spike, a bogus escalation alert, and potentially a care-response block
sent to an unsuspecting user.
Root Cause
The
session_idfield accepted bycheck(),check_async(), and the/v1/checkHTTP endpoint is purely caller-supplied with no binding to anauthenticated identity. The trajectory store (SQLite/Redis/Postgres) uses it
as a raw key:
get_session_risk(session_id)— fetches any session by its string keylist_recent_escalations(session_id=...)— filters by raw stringDELETE /admin/sessions/{id}— deletes by raw stringThere is no concept of a session owner — the session namespace is globally
flat and writable by any caller.
Affected Components
humane_proxy/pipeline.py(or equivalent)check()/check_async()acceptsession_idwithout ownership validationhumane_proxy/trajectory.py(or equivalent)session_idas the store keyhumane_proxy/storage/(sqlite/redis/postgres backends)POST /v1/chat/completionssession_idfrom request body without verificationcheck_message_safetysession_idparameter to any connected AI agentProposed Fix
1. Bind session ownership at creation time.
When a
session_idis first written to the store, record asession_ownertoken (e.g., a hash of the API key or client IP + secret). On all subsequent
writes to that session, verify the token matches:
2. Add a
session_ownercolumn to all storage backends.3. Document the threat in
SECURITY.mdand the configuration reference.Until a full fix ships, operators should be warned to treat
session_idas asensitive value and avoid using predictable identifiers (usernames, emails,
sequential IDs).