Santize Webhook Logging and Do not return raw upstream bodies to clients#36
Santize Webhook Logging and Do not return raw upstream bodies to clients#36Shayan-Bhowmik wants to merge 3 commits into
Conversation
|
|
|
Warning Review limit reached
More reviews will be available in 54 minutes and 58 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughSanitizes webhook URLs to host-only form and redacts response bodies in webhook dispatcher logs; adds upstream helper to return a redacted marker and modifies non-JSON upstream handling to log the condition but not return raw content; adds tests and bumps dependency minimums. ChangesSensitive Data Protection in Logging
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Hardens logging and error responses to avoid leaking sensitive data from upstream LLM responses and webhook URLs/bodies.
Changes:
- Strips raw upstream body from non-JSON error responses returned to clients and logs only redacted markers.
- Sanitizes webhook URLs (host-only) and removes response body content from warning/exception logs.
- Adds tests covering URL sanitization, redaction of webhook bodies, and non-JSON upstream handling.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| humane_proxy/middleware/interceptor.py | Removes raw upstream body from error response; logs redacted warning instead. |
| humane_proxy/escalation/webhooks.py | Adds _sanitize_webhook_url and replaces logged URL/body with sanitized/redacted forms. |
| tests/test_interceptor.py | New test verifying non-JSON upstream responses don't leak secrets in body or logs. |
| tests/test_webhooks.py | New tests for URL sanitization and redaction of webhook response logging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _redact_upstream_body(body: str, limit: int = 120) -> str: | ||
| """Return a redacted marker for internal logs.""" | ||
| _ = (body, limit) | ||
| return "[redacted]" |
| raw_body = llm_response.text | ||
| logger.warning( | ||
| "Upstream returned non-JSON (HTTP %d); body redacted", | ||
| llm_response.status_code, | ||
| ) |
| def _redact_response_text(text: str, limit: int = 120) -> str: | ||
| """Return a redacted marker for logs.""" | ||
| _ = limit | ||
| return "[redacted]" |
| def _sanitize_webhook_url(url: str) -> str: | ||
| """Return a host-only representation of a webhook URL.""" | ||
| parts = urlsplit(url) | ||
| if not parts.scheme or not parts.netloc: | ||
| return url |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@humane_proxy/escalation/webhooks.py`:
- Around line 21-26: _sanitize_webhook_url currently returns parts.scheme +
parts.netloc which can include embedded userinfo and leak credentials; change it
to build the host from parts.hostname and parts.port (use parts.hostname
directly and append ":<port>" only if parts.port is not None), preserve the
scheme and return a urlunsplit or equivalent with path/query/fragment empty, and
if parts.hostname is None fall back to returning the original url to avoid
breaking malformed inputs; update the function _sanitize_webhook_url to use
parts.hostname/parts.port instead of parts.netloc.
- Around line 29-32: The helper _redact_response_text is dead code; either
remove it or use it to avoid logging raw response bodies from _post. Fix by (A)
deleting the unused function _redact_response_text if you don't intend to log
any response body, or (B) wiring it into the error logging in _post so that when
resp.status_code >= 400 you log a redacted snippet instead of the full
resp.text/len: call _redact_response_text(resp.text, limit) and include its
return value in the warning message (keep the existing resp.status_code check
and context). Ensure references are to the existing function name
_redact_response_text, the _post function, and resp.status_code/resp.text so
reviewers can locate the change.
In `@humane_proxy/middleware/interceptor.py`:
- Around line 160-168: Remove the unused read of the upstream body: delete the
assignment to raw_body = llm_response.text and any unused variable references in
the interceptor code path that handles non-JSON responses (the block referencing
llm_response and building body = {"status": "error", ...}); keep the existing
logger.warning that redacts the body and retain the constructed body message so
behavior is unchanged.
- Around line 74-77: _delete the unused helper _redact_upstream_body and remove
the unused local raw_body in the upstream non-JSON branch of interceptor.py:
remove the entire _redact_upstream_body function definition and in the upstream
non-JSON branch (where raw_body = llm_response.text is currently set) delete
that assignment and ensure the log/warning uses a concrete redacted marker (e.g.
the literal "[redacted]") instead of relying on the removed helper; reference
symbols: _redact_upstream_body, raw_body, and the upstream non-JSON branch in
humane_proxy/middleware/interceptor.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3680fd7c-dda5-460c-a8cf-de64675e02d0
📒 Files selected for processing (4)
humane_proxy/escalation/webhooks.pyhumane_proxy/middleware/interceptor.pytests/test_interceptor.pytests/test_webhooks.py
| def _sanitize_webhook_url(url: str) -> str: | ||
| """Return a host-only representation of a webhook URL.""" | ||
| parts = urlsplit(url) | ||
| if not parts.scheme or not parts.netloc: | ||
| return url | ||
| return urlunsplit((parts.scheme, parts.netloc, "", "", "")) |
There was a problem hiding this comment.
netloc can still leak embedded credentials.
urlsplit().netloc includes any userinfo component, so a URL like https://user:token@host/path sanitizes to https://user:token@host, leaking the secret you intend to redact. Use hostname (plus port) instead.
🛡️ Proposed fix
parts = urlsplit(url)
if not parts.scheme or not parts.netloc:
return url
- return urlunsplit((parts.scheme, parts.netloc, "", "", ""))
+ host = parts.hostname or ""
+ netloc = f"{host}:{parts.port}" if parts.port else host
+ return urlunsplit((parts.scheme, netloc, "", "", ""))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/escalation/webhooks.py` around lines 21 - 26,
_sanitize_webhook_url currently returns parts.scheme + parts.netloc which can
include embedded userinfo and leak credentials; change it to build the host from
parts.hostname and parts.port (use parts.hostname directly and append ":<port>"
only if parts.port is not None), preserve the scheme and return a urlunsplit or
equivalent with path/query/fragment empty, and if parts.hostname is None fall
back to returning the original url to avoid breaking malformed inputs; update
the function _sanitize_webhook_url to use parts.hostname/parts.port instead of
parts.netloc.
| def _redact_response_text(text: str, limit: int = 120) -> str: | ||
| """Return a redacted marker for logs.""" | ||
| _ = limit | ||
| return "[redacted]" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm _redact_response_text is not referenced anywhere else.
rg -nP '\b_redact_response_text\s*\('Repository: Vishisht16/Humane-Proxy
Length of output: 169
Remove (or wire up) the dead _redact_response_text helper. In humane_proxy/escalation/webhooks.py (lines 29-32), _redact_response_text is never called anywhere in the repo; _post’s resp.status_code >= 400 warning logs only len(resp.text) and doesn’t redact using this helper. Delete it or use it where response-body redaction is intended.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/escalation/webhooks.py` around lines 29 - 32, The helper
_redact_response_text is dead code; either remove it or use it to avoid logging
raw response bodies from _post. Fix by (A) deleting the unused function
_redact_response_text if you don't intend to log any response body, or (B)
wiring it into the error logging in _post so that when resp.status_code >= 400
you log a redacted snippet instead of the full resp.text/len: call
_redact_response_text(resp.text, limit) and include its return value in the
warning message (keep the existing resp.status_code check and context). Ensure
references are to the existing function name _redact_response_text, the _post
function, and resp.status_code/resp.text so reviewers can locate the change.
| def _redact_upstream_body(body: str, limit: int = 120) -> str: | ||
| """Return a redacted marker for internal logs.""" | ||
| _ = (body, limit) | ||
| return "[redacted]" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm _redact_upstream_body is unused elsewhere.
rg -nP '\b_redact_upstream_body\s*\('Repository: Vishisht16/Humane-Proxy
Length of output: 172
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find all references/usages across the repo (not just calls with parentheses)
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' .
# 2) Show the surrounding implementation and any potential call sites in the target file
sed -n '1,120p' humane_proxy/middleware/interceptor.py
sed -n '130,220p' humane_proxy/middleware/interceptor.py
sed -n '220,320p' humane_proxy/middleware/interceptor.pyRepository: Vishisht16/Humane-Proxy
Length of output: 49
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Capture both stdout and stderr for the usage search
echo "== References to _redact_upstream_body =="
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true
echo
echo "== Lines around raw_body and error handling =="
nl -ba humane_proxy/middleware/interceptor.py | sed -n '120,210p'Repository: Vishisht16/Humane-Proxy
Length of output: 219
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== References to _redact_upstream_body =="
rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true
echo
echo "== raw_body assignments =="
rg -n --hidden --no-ignore-vcs '\braw_body\b' humane_proxy/middleware/interceptor.py || true
echo
echo "== error-body handling region =="
rg -n --hidden --no-ignore-vcs 'error|exception|body|redact' humane_proxy/middleware/interceptor.py | head -n 80 || true
echo
echo "== Relevant file sections =="
echo "-- 1-120"
cat -n humane_proxy/middleware/interceptor.py | sed -n '1,120p'
echo "-- 120-220"
cat -n humane_proxy/middleware/interceptor.py | sed -n '120,220p'
echo "-- 220-320"
cat -n humane_proxy/middleware/interceptor.py | sed -n '220,320p'Repository: Vishisht16/Humane-Proxy
Length of output: 7789
Remove dead redaction helper and unused raw_body
_redact_upstream_bodyinhumane_proxy/middleware/interceptor.pyhas no call sites (only its definition exists).- In the upstream non-JSON branch,
raw_body = llm_response.textis assigned but never used; the warning claims the body is “redacted” even though the helper isn’t called.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/middleware/interceptor.py` around lines 74 - 77, _delete the
unused helper _redact_upstream_body and remove the unused local raw_body in the
upstream non-JSON branch of interceptor.py: remove the entire
_redact_upstream_body function definition and in the upstream non-JSON branch
(where raw_body = llm_response.text is currently set) delete that assignment and
ensure the log/warning uses a concrete redacted marker (e.g. the literal
"[redacted]") instead of relying on the removed helper; reference symbols:
_redact_upstream_body, raw_body, and the upstream non-JSON branch in
humane_proxy/middleware/interceptor.py.
| raw_body = llm_response.text | ||
| logger.warning( | ||
| "Upstream returned non-JSON (HTTP %d); body redacted", | ||
| llm_response.status_code, | ||
| ) | ||
| body = { | ||
| "status": "error", | ||
| "message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).", | ||
| "raw": llm_response.text[:500], | ||
| } |
There was a problem hiding this comment.
raw_body is assigned but never used.
Reading llm_response.text into raw_body materializes the full upstream body without using it. The redaction goal is already met by logging only the status, so drop the unused read.
🧹 Proposed cleanup
except (ValueError, TypeError):
- raw_body = llm_response.text
logger.warning(
"Upstream returned non-JSON (HTTP %d); body redacted",
llm_response.status_code,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raw_body = llm_response.text | |
| logger.warning( | |
| "Upstream returned non-JSON (HTTP %d); body redacted", | |
| llm_response.status_code, | |
| ) | |
| body = { | |
| "status": "error", | |
| "message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).", | |
| "raw": llm_response.text[:500], | |
| } | |
| logger.warning( | |
| "Upstream returned non-JSON (HTTP %d); body redacted", | |
| llm_response.status_code, | |
| ) | |
| body = { | |
| "status": "error", | |
| "message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).", | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@humane_proxy/middleware/interceptor.py` around lines 160 - 168, Remove the
unused read of the upstream body: delete the assignment to raw_body =
llm_response.text and any unused variable references in the interceptor code
path that handles non-JSON responses (the block referencing llm_response and
building body = {"status": "error", ...}); keep the existing logger.warning that
redacts the body and retain the constructed body message so behavior is
unchanged.
|
|
||
| await _post("https://hooks.slack.com/services/T000/B000/SECRET?token=abc123", {"x": 1}) | ||
|
|
||
| assert "hooks.slack.com" in caplog.text |
|
@Shayan-Bhowmik your code introduced a new high severity vulnerability. CodeRabbit and Copilot have also flagged some specific lines in the code. Please look into it and let me know if you need any kind of help. |
|
@Vishisht16 looking into it |
|
@Vishisht16 I have resolved the merge conflicts and solve the vulnerablities, please review the code. |
Description
Fix two security/privacy issues in the request pipeline:
Type of Change
Checklist
pytest tests/ -v)