Skip to content

[Security/Bug] Session Risk Trajectory Poisoning via Cross-Session session_id Collision #24

@ionfwsrijan

Description

@ionfwsrijan

Summary

HumaneProxy's Risk Trajectory & Time-Decay engine computes a per-session
rolling window of risk scores, detects spikes, and applies a +0.25 score
boost when a sudden elevation is found. This is a powerful safety feature —
but it contains a critical design gap: session_id is a caller-supplied,
unvalidated string
.

There is no mechanism in the pipeline that binds a session_id to an
authenticated identity. This means:

  1. An adversarial user can deliberately supply another user's session_id
    to inject artificially low scores into that user's trajectory window,
    diluting their accumulated risk and suppressing future spike detection.
  2. Conversely, injecting high scores into a victim's window can trigger
    false-positive spike boosts and cause legitimate messages to be blocked
    and operators to be flooded with false escalation alerts.
  3. In a multi-tenant deployment (the norm for any SaaS using this as
    middleware), these two attacks require zero authentication — just
    knowledge of another user's session_id, which is often a predictable
    value like a database row ID, username, or email hash.

Attack Scenarios

Scenario A — Risk Dilution (Bypass)

An attacker who wants to avoid detection pre-poisons their own session window
with a series of benign, low-score messages under their real session_id.
The time-decay-weighted mean is pulled down to near zero. When they finally
send a harmful message, the delta is high — but they have already padded the
window enough that the weighted_mean absorbs the spike, and the final score
lands just below the escalate_threshold.

# Attacker sends 4 innocent messages → window mean ≈ 0.05
check("The weather is nice today",      session_id="attacker-42")  # 0.02
check("Can you help me with Python?",   session_id="attacker-42")  # 0.03
check("What are some good recipes?",    session_id="attacker-42")  # 0.04
check("Tell me about Paris",            session_id="attacker-42")  # 0.03

# Now the harmful message:
check("How do I obtain prescription opioids without a doctor?",
      session_id="attacker-42")  # raw score: 0.72
# delta = 0.72 − 0.03 = 0.69  → spike detected, boost applied → 0.72 + 0.25 = 0.97

# BUT: if they also interleaved medium-risk messages first to raise the mean:
# weighted_mean ≈ 0.50, delta = 0.72 − 0.50 = 0.22 < spike_delta (0.35)
# → NO spike boost → final score = 0.72 → just above threshold (if threshold = 0.7)
# Fine-tuning the mean is trivially achievable.

Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)

An attacker who knows victim's session_id (e.g., it's their username) sends
high-score messages under the victim's session. The victim's next message —
however innocent — may be evaluated against an inflated mean, generating a
false spike, a bogus escalation alert, and potentially a care-response block
sent to an unsuspecting user.

# Attacker poisons victim's session
check("I hate everything and want to die",  session_id="victim-user-99")  # score ≈ 0.95
check("Nobody cares if I disappear",        session_id="victim-user-99")  # score ≈ 0.85

# Victim sends an innocent next message:
check("Can you recommend a good movie?",    session_id="victim-user-99")  # raw: 0.05
# weighted_mean from poisoned window ≈ 0.90
# delta = 0.05 − 0.90 = −0.85 → no spike (drop, not rise)
# BUT window is now poisoned for the NEXT message the victim sends,
# meaning the baseline is permanently elevated until decay kicks in (24h default).

Root Cause

The session_id field accepted by check(), check_async(), and the
/v1/check HTTP endpoint is purely caller-supplied with no binding to an
authenticated identity. The trajectory store (SQLite/Redis/Postgres) uses it
as a raw key:

  • get_session_risk(session_id) — fetches any session by its string key
  • list_recent_escalations(session_id=...) — filters by raw string
  • DELETE /admin/sessions/{id} — deletes by raw string

There is no concept of a session owner — the session namespace is globally
flat and writable by any caller.


Affected Components

Component Role in Bug
humane_proxy/pipeline.py (or equivalent) check() / check_async() accept session_id without ownership validation
humane_proxy/trajectory.py (or equivalent) Rolling window read/write uses raw session_id as the store key
humane_proxy/storage/ (sqlite/redis/postgres backends) No per-session ownership column or ACL
REST proxy endpoint POST /v1/chat/completions Forwards session_id from request body without verification
MCP tool check_message_safety Exposes session_id parameter to any connected AI agent

Proposed Fix

1. Bind session ownership at creation time.

When a session_id is first written to the store, record a session_owner
token (e.g., a hash of the API key or client IP + secret). On all subsequent
writes to that session, verify the token matches:

def _assert_session_owner(self, session_id: str, owner_token: str):
    existing_owner = self._store.get_session_owner(session_id)
    if existing_owner is None:
        self._store.set_session_owner(session_id, owner_token)
    elif existing_owner != owner_token:
        raise SessionOwnershipError(
            f"session_id '{session_id}' belongs to a different caller"
        )

2. Add a session_owner column to all storage backends.

ALTER TABLE sessions ADD COLUMN owner_token TEXT NOT NULL DEFAULT '';

3. Document the threat in SECURITY.md and the configuration reference.

Until a full fix ships, operators should be warned to treat session_id as a
sensitive value and avoid using predictable identifiers (usernames, emails,
sequential IDs).


Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions