Skip to content

Santize Webhook Logging and Do not return raw upstream bodies to clients#36

Open
Shayan-Bhowmik wants to merge 3 commits into
Vishisht16:mainfrom
Shayan-Bhowmik:webhook-logging
Open

Santize Webhook Logging and Do not return raw upstream bodies to clients#36
Shayan-Bhowmik wants to merge 3 commits into
Vishisht16:mainfrom
Shayan-Bhowmik:webhook-logging

Conversation

@Shayan-Bhowmik

Copy link
Copy Markdown

Description

Fix two security/privacy issues in the request pipeline:

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • I have read the CONTRIBUTING guide
  • My code follows the project's style
  • I have added tests for new or changed behaviour
  • All tests pass (pytest tests/ -v)
  • I have updated documentation if needed
  • Self-harm / safety-related changes have been reviewed for sensitivity

Copilot AI review requested due to automatic review settings May 29, 2026 15:24
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@Shayan-Bhowmik, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 54 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 963f5a5b-1f5c-4b3f-b742-efc0bdf85d2d

📥 Commits

Reviewing files that changed from the base of the PR and between 39ec464 and 941fa07.

📒 Files selected for processing (2)
  • humane_proxy/middleware/interceptor.py
  • tests/test_interceptor.py
📝 Walkthrough

Walkthrough

Sanitizes webhook URLs to host-only form and redacts response bodies in webhook dispatcher logs; adds upstream helper to return a redacted marker and modifies non-JSON upstream handling to log the condition but not return raw content; adds tests and bumps dependency minimums.

Changes

Sensitive Data Protection in Logging

Layer / File(s) Summary
Webhook URL sanitization and logging updates
humane_proxy/escalation/webhooks.py
Added _sanitize_webhook_url() and a response-redaction helper; updated _post() to log sanitized webhook host and response length/status instead of response bodies.
Upstream response body redaction in error handling
humane_proxy/middleware/interceptor.py
Added _redact_upstream_body(); non-JSON upstream response path now reads upstream text, logs a warning (redacted), and returns a 502 error without embedding raw upstream content.
Webhook sanitization tests
tests/test_webhooks.py
Imported logging and _sanitize_webhook_url; added tests that assert _post() redacts URL/token details in logs and that _sanitize_webhook_url() returns a host-only string for Slack URLs and unchanged for non-URLs.
Upstream non-JSON error handling tests
tests/test_interceptor.py
New test class simulates upstream non-JSON response via a fake httpx.AsyncClient and asserts /chat returns 502 with no raw field and no leaked token text in logs.
Dependency version updates
pyproject.toml, requirements.txt
Raised minimum versions for fastapi, httpx, numpy, pyyaml, uvicorn, click, rich, adjusted webhooks extra httpx and crewai[tools] constraints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 I hopped through logs with careful paws,

Snipped URLs to just their hosts and laws,
Hid upstream texts behind a gentle sign,
No tokens spilled, the trace is fine —
A tidy hop, secure and kind.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: webhook logging sanitization and preventing raw upstream bodies from being returned to clients.
Description check ✅ Passed The description is directly related to the changeset, explaining the two security fixes addressed by the pull request and referencing issue #33.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Hardens logging and error responses to avoid leaking sensitive data from upstream LLM responses and webhook URLs/bodies.

Changes:

  • Strips raw upstream body from non-JSON error responses returned to clients and logs only redacted markers.
  • Sanitizes webhook URLs (host-only) and removes response body content from warning/exception logs.
  • Adds tests covering URL sanitization, redaction of webhook bodies, and non-JSON upstream handling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
humane_proxy/middleware/interceptor.py Removes raw upstream body from error response; logs redacted warning instead.
humane_proxy/escalation/webhooks.py Adds _sanitize_webhook_url and replaces logged URL/body with sanitized/redacted forms.
tests/test_interceptor.py New test verifying non-JSON upstream responses don't leak secrets in body or logs.
tests/test_webhooks.py New tests for URL sanitization and redaction of webhook response logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +74 to +77
def _redact_upstream_body(body: str, limit: int = 120) -> str:
"""Return a redacted marker for internal logs."""
_ = (body, limit)
return "[redacted]"
Comment on lines +160 to +164
raw_body = llm_response.text
logger.warning(
"Upstream returned non-JSON (HTTP %d); body redacted",
llm_response.status_code,
)
Comment on lines +29 to +32
def _redact_response_text(text: str, limit: int = 120) -> str:
"""Return a redacted marker for logs."""
_ = limit
return "[redacted]"
Comment on lines +21 to +25
def _sanitize_webhook_url(url: str) -> str:
"""Return a host-only representation of a webhook URL."""
parts = urlsplit(url)
if not parts.scheme or not parts.netloc:
return url

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@humane_proxy/escalation/webhooks.py`:
- Around line 21-26: _sanitize_webhook_url currently returns parts.scheme +
parts.netloc which can include embedded userinfo and leak credentials; change it
to build the host from parts.hostname and parts.port (use parts.hostname
directly and append ":<port>" only if parts.port is not None), preserve the
scheme and return a urlunsplit or equivalent with path/query/fragment empty, and
if parts.hostname is None fall back to returning the original url to avoid
breaking malformed inputs; update the function _sanitize_webhook_url to use
parts.hostname/parts.port instead of parts.netloc.
- Around line 29-32: The helper _redact_response_text is dead code; either
remove it or use it to avoid logging raw response bodies from _post. Fix by (A)
deleting the unused function _redact_response_text if you don't intend to log
any response body, or (B) wiring it into the error logging in _post so that when
resp.status_code >= 400 you log a redacted snippet instead of the full
resp.text/len: call _redact_response_text(resp.text, limit) and include its
return value in the warning message (keep the existing resp.status_code check
and context). Ensure references are to the existing function name
_redact_response_text, the _post function, and resp.status_code/resp.text so
reviewers can locate the change.

In `@humane_proxy/middleware/interceptor.py`:
- Around line 160-168: Remove the unused read of the upstream body: delete the
assignment to raw_body = llm_response.text and any unused variable references in
the interceptor code path that handles non-JSON responses (the block referencing
llm_response and building body = {"status": "error", ...}); keep the existing
logger.warning that redacts the body and retain the constructed body message so
behavior is unchanged.
- Around line 74-77: _delete the unused helper _redact_upstream_body and remove
the unused local raw_body in the upstream non-JSON branch of interceptor.py:
remove the entire _redact_upstream_body function definition and in the upstream
non-JSON branch (where raw_body = llm_response.text is currently set) delete
that assignment and ensure the log/warning uses a concrete redacted marker (e.g.
the literal "[redacted]") instead of relying on the removed helper; reference
symbols: _redact_upstream_body, raw_body, and the upstream non-JSON branch in
humane_proxy/middleware/interceptor.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3680fd7c-dda5-460c-a8cf-de64675e02d0

📥 Commits

Reviewing files that changed from the base of the PR and between 1924099 and 408adb6.

📒 Files selected for processing (4)
  • humane_proxy/escalation/webhooks.py
  • humane_proxy/middleware/interceptor.py
  • tests/test_interceptor.py
  • tests/test_webhooks.py

Comment on lines +21 to +26
def _sanitize_webhook_url(url: str) -> str:
"""Return a host-only representation of a webhook URL."""
parts = urlsplit(url)
if not parts.scheme or not parts.netloc:
return url
return urlunsplit((parts.scheme, parts.netloc, "", "", ""))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

netloc can still leak embedded credentials.

urlsplit().netloc includes any userinfo component, so a URL like https://user:token@host/path sanitizes to https://user:token@host, leaking the secret you intend to redact. Use hostname (plus port) instead.

🛡️ Proposed fix
     parts = urlsplit(url)
     if not parts.scheme or not parts.netloc:
         return url
-    return urlunsplit((parts.scheme, parts.netloc, "", "", ""))
+    host = parts.hostname or ""
+    netloc = f"{host}:{parts.port}" if parts.port else host
+    return urlunsplit((parts.scheme, netloc, "", "", ""))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@humane_proxy/escalation/webhooks.py` around lines 21 - 26,
_sanitize_webhook_url currently returns parts.scheme + parts.netloc which can
include embedded userinfo and leak credentials; change it to build the host from
parts.hostname and parts.port (use parts.hostname directly and append ":<port>"
only if parts.port is not None), preserve the scheme and return a urlunsplit or
equivalent with path/query/fragment empty, and if parts.hostname is None fall
back to returning the original url to avoid breaking malformed inputs; update
the function _sanitize_webhook_url to use parts.hostname/parts.port instead of
parts.netloc.

Comment on lines +29 to +32
def _redact_response_text(text: str, limit: int = 120) -> str:
"""Return a redacted marker for logs."""
_ = limit
return "[redacted]"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm _redact_response_text is not referenced anywhere else.
rg -nP '\b_redact_response_text\s*\('

Repository: Vishisht16/Humane-Proxy

Length of output: 169


Remove (or wire up) the dead _redact_response_text helper. In humane_proxy/escalation/webhooks.py (lines 29-32), _redact_response_text is never called anywhere in the repo; _post’s resp.status_code >= 400 warning logs only len(resp.text) and doesn’t redact using this helper. Delete it or use it where response-body redaction is intended.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@humane_proxy/escalation/webhooks.py` around lines 29 - 32, The helper
_redact_response_text is dead code; either remove it or use it to avoid logging
raw response bodies from _post. Fix by (A) deleting the unused function
_redact_response_text if you don't intend to log any response body, or (B)
wiring it into the error logging in _post so that when resp.status_code >= 400
you log a redacted snippet instead of the full resp.text/len: call
_redact_response_text(resp.text, limit) and include its return value in the
warning message (keep the existing resp.status_code check and context). Ensure
references are to the existing function name _redact_response_text, the _post
function, and resp.status_code/resp.text so reviewers can locate the change.

Comment on lines +74 to +77
def _redact_upstream_body(body: str, limit: int = 120) -> str:
"""Return a redacted marker for internal logs."""
_ = (body, limit)
return "[redacted]"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm _redact_upstream_body is unused elsewhere.
rg -nP '\b_redact_upstream_body\s*\('

Repository: Vishisht16/Humane-Proxy

Length of output: 172


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find all references/usages across the repo (not just calls with parentheses)
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' .

# 2) Show the surrounding implementation and any potential call sites in the target file
sed -n '1,120p' humane_proxy/middleware/interceptor.py
sed -n '130,220p' humane_proxy/middleware/interceptor.py
sed -n '220,320p' humane_proxy/middleware/interceptor.py

Repository: Vishisht16/Humane-Proxy

Length of output: 49


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Capture both stdout and stderr for the usage search
echo "== References to _redact_upstream_body =="
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true

echo
echo "== Lines around raw_body and error handling =="
nl -ba humane_proxy/middleware/interceptor.py | sed -n '120,210p'

Repository: Vishisht16/Humane-Proxy

Length of output: 219


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== References to _redact_upstream_body =="
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true

echo
echo "== raw_body assignments =="
rg -n --hidden --no-ignore-vcs '\braw_body\b' humane_proxy/middleware/interceptor.py || true

echo
echo "== error-body handling region =="
rg -n --hidden --no-ignore-vcs 'error|exception|body|redact' humane_proxy/middleware/interceptor.py | head -n 80 || true

echo
echo "== Relevant file sections =="
echo "-- 1-120"
cat -n humane_proxy/middleware/interceptor.py | sed -n '1,120p'
echo "-- 120-220"
cat -n humane_proxy/middleware/interceptor.py | sed -n '120,220p'
echo "-- 220-320"
cat -n humane_proxy/middleware/interceptor.py | sed -n '220,320p'

Repository: Vishisht16/Humane-Proxy

Length of output: 7789


Remove dead redaction helper and unused raw_body

  • _redact_upstream_body in humane_proxy/middleware/interceptor.py has no call sites (only its definition exists).
  • In the upstream non-JSON branch, raw_body = llm_response.text is assigned but never used; the warning claims the body is “redacted” even though the helper isn’t called.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@humane_proxy/middleware/interceptor.py` around lines 74 - 77, _delete the
unused helper _redact_upstream_body and remove the unused local raw_body in the
upstream non-JSON branch of interceptor.py: remove the entire
_redact_upstream_body function definition and in the upstream non-JSON branch
(where raw_body = llm_response.text is currently set) delete that assignment and
ensure the log/warning uses a concrete redacted marker (e.g. the literal
"[redacted]") instead of relying on the removed helper; reference symbols:
_redact_upstream_body, raw_body, and the upstream non-JSON branch in
humane_proxy/middleware/interceptor.py.

Comment on lines +160 to 168
raw_body = llm_response.text
logger.warning(
"Upstream returned non-JSON (HTTP %d); body redacted",
llm_response.status_code,
)
body = {
"status": "error",
"message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",
"raw": llm_response.text[:500],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

raw_body is assigned but never used.

Reading llm_response.text into raw_body materializes the full upstream body without using it. The redaction goal is already met by logging only the status, so drop the unused read.

🧹 Proposed cleanup
         except (ValueError, TypeError):
-            raw_body = llm_response.text
             logger.warning(
                 "Upstream returned non-JSON (HTTP %d); body redacted",
                 llm_response.status_code,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raw_body = llm_response.text
logger.warning(
"Upstream returned non-JSON (HTTP %d); body redacted",
llm_response.status_code,
)
body = {
"status": "error",
"message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",
"raw": llm_response.text[:500],
}
logger.warning(
"Upstream returned non-JSON (HTTP %d); body redacted",
llm_response.status_code,
)
body = {
"status": "error",
"message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@humane_proxy/middleware/interceptor.py` around lines 160 - 168, Remove the
unused read of the upstream body: delete the assignment to raw_body =
llm_response.text and any unused variable references in the interceptor code
path that handles non-JSON responses (the block referencing llm_response and
building body = {"status": "error", ...}); keep the existing logger.warning that
redacts the body and retain the constructed body message so behavior is
unchanged.

Comment thread tests/test_webhooks.py

await _post("https://hooks.slack.com/services/T000/B000/SECRET?token=abc123", {"x": 1})

assert "hooks.slack.com" in caplog.text
@Vishisht16

Copy link
Copy Markdown
Owner

@Shayan-Bhowmik your code introduced a new high severity vulnerability. CodeRabbit and Copilot have also flagged some specific lines in the code. Please look into it and let me know if you need any kind of help.

@Shayan-Bhowmik

Copy link
Copy Markdown
Author

@Vishisht16 looking into it

@Shayan-Bhowmik

Copy link
Copy Markdown
Author

@Vishisht16 I have resolved the merge conflicts and solve the vulnerablities, please review the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants