[Security/Bug] Session Risk Trajectory Poisoning via Cross-Session session_id Collision

## Summary

HumaneProxy's **Risk Trajectory & Time-Decay** engine computes a per-session
rolling window of risk scores, detects spikes, and applies a `+0.25` score
boost when a sudden elevation is found. This is a powerful safety feature —
but it contains a critical design gap: **`session_id` is a caller-supplied,
unvalidated string**.

There is no mechanism in the pipeline that binds a `session_id` to an
authenticated identity. This means:

1. **An adversarial user can deliberately supply another user's `session_id`**
   to inject artificially low scores into that user's trajectory window,
   diluting their accumulated risk and suppressing future spike detection.
2. **Conversely**, injecting high scores into a victim's window can trigger
   false-positive spike boosts and cause legitimate messages to be blocked
   and operators to be flooded with false escalation alerts.
3. In a multi-tenant deployment (the norm for any SaaS using this as
   middleware), these two attacks require **zero authentication** — just
   knowledge of another user's `session_id`, which is often a predictable
   value like a database row ID, username, or email hash.

---

## Attack Scenarios

### Scenario A — Risk Dilution (Bypass)

An attacker who wants to avoid detection pre-poisons their own session window
with a series of benign, low-score messages under their real `session_id`.
The time-decay-weighted mean is pulled down to near zero. When they finally
send a harmful message, the delta is high — but they have already padded the
window enough that the `weighted_mean` absorbs the spike, and the final score
lands just below the `escalate_threshold`.

```
# Attacker sends 4 innocent messages → window mean ≈ 0.05
check("The weather is nice today",      session_id="attacker-42")  # 0.02
check("Can you help me with Python?",   session_id="attacker-42")  # 0.03
check("What are some good recipes?",    session_id="attacker-42")  # 0.04
check("Tell me about Paris",            session_id="attacker-42")  # 0.03

# Now the harmful message:
check("How do I obtain prescription opioids without a doctor?",
      session_id="attacker-42")  # raw score: 0.72
# delta = 0.72 − 0.03 = 0.69  → spike detected, boost applied → 0.72 + 0.25 = 0.97

# BUT: if they also interleaved medium-risk messages first to raise the mean:
# weighted_mean ≈ 0.50, delta = 0.72 − 0.50 = 0.22 < spike_delta (0.35)
# → NO spike boost → final score = 0.72 → just above threshold (if threshold = 0.7)
# Fine-tuning the mean is trivially achievable.
```

### Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)

An attacker who knows victim's `session_id` (e.g., it's their username) sends
high-score messages under the victim's session. The victim's next message —
however innocent — may be evaluated against an inflated mean, generating a
false spike, a bogus escalation alert, and potentially a care-response block
sent to an unsuspecting user.

```python
# Attacker poisons victim's session
check("I hate everything and want to die",  session_id="victim-user-99")  # score ≈ 0.95
check("Nobody cares if I disappear",        session_id="victim-user-99")  # score ≈ 0.85

# Victim sends an innocent next message:
check("Can you recommend a good movie?",    session_id="victim-user-99")  # raw: 0.05
# weighted_mean from poisoned window ≈ 0.90
# delta = 0.05 − 0.90 = −0.85 → no spike (drop, not rise)
# BUT window is now poisoned for the NEXT message the victim sends,
# meaning the baseline is permanently elevated until decay kicks in (24h default).
```

---

## Root Cause

The `session_id` field accepted by `check()`, `check_async()`, and the
`/v1/check` HTTP endpoint is **purely caller-supplied** with no binding to an
authenticated identity. The trajectory store (SQLite/Redis/Postgres) uses it
as a raw key:

- `get_session_risk(session_id)` — fetches any session by its string key
- `list_recent_escalations(session_id=...)` — filters by raw string
- `DELETE /admin/sessions/{id}` — deletes by raw string

There is no concept of a **session owner** — the session namespace is globally
flat and writable by any caller.

---

## Affected Components

| Component | Role in Bug |
|---|---|
| `humane_proxy/pipeline.py` (or equivalent) | `check()` / `check_async()` accept `session_id` without ownership validation |
| `humane_proxy/trajectory.py` (or equivalent) | Rolling window read/write uses raw `session_id` as the store key |
| `humane_proxy/storage/` (sqlite/redis/postgres backends) | No per-session ownership column or ACL |
| REST proxy endpoint `POST /v1/chat/completions` | Forwards `session_id` from request body without verification |
| MCP tool `check_message_safety` | Exposes `session_id` parameter to any connected AI agent |

---

## Proposed Fix

**1. Bind session ownership at creation time.**

When a `session_id` is first written to the store, record a `session_owner`
token (e.g., a hash of the API key or client IP + secret). On all subsequent
writes to that session, verify the token matches:

```python
def _assert_session_owner(self, session_id: str, owner_token: str):
    existing_owner = self._store.get_session_owner(session_id)
    if existing_owner is None:
        self._store.set_session_owner(session_id, owner_token)
    elif existing_owner != owner_token:
        raise SessionOwnershipError(
            f"session_id '{session_id}' belongs to a different caller"
        )
```

**2. Add a `session_owner` column to all storage backends.**

```sql
ALTER TABLE sessions ADD COLUMN owner_token TEXT NOT NULL DEFAULT '';
```

**3. Document the threat in `SECURITY.md` and the configuration reference.**

Until a full fix ships, operators should be warned to treat `session_id` as a
sensitive value and avoid using predictable identifiers (usernames, emails,
sequential IDs).

---

Component	Role in Bug
`humane_proxy/pipeline.py` (or equivalent)	`check()` / `check_async()` accept `session_id` without ownership validation
`humane_proxy/trajectory.py` (or equivalent)	Rolling window read/write uses raw `session_id` as the store key
`humane_proxy/storage/` (sqlite/redis/postgres backends)	No per-session ownership column or ACL
REST proxy endpoint `POST /v1/chat/completions`	Forwards `session_id` from request body without verification
MCP tool `check_message_safety`	Exposes `session_id` parameter to any connected AI agent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Security/Bug] Session Risk Trajectory Poisoning via Cross-Session session_id Collision #24

Summary

Attack Scenarios

Scenario A — Risk Dilution (Bypass)

Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)

Root Cause

Affected Components

Proposed Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Security/Bug] Session Risk Trajectory Poisoning via Cross-Session session_id Collision #24

Description

Summary

Attack Scenarios

Scenario A — Risk Dilution (Bypass)

Scenario B — Cross-Session Trajectory Poisoning (DoS / False Positive)

Root Cause

Affected Components

Proposed Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions