Santize Webhook Logging and Do not return raw upstream bodies to clients by Shayan-Bhowmik · Pull Request #36 · Vishisht16/Humane-Proxy

Shayan-Bhowmik · 2026-05-29T15:24:42Z

Description

Fix two security/privacy issues in the request pipeline:

Webhook error logging now redacts webhook URL paths and does not log potentially sensitive URL substrings.
The interceptor no longer returns raw upstream non-JSON response bodies to clients; it logs a redacted internal message instead.
Solves [FEATURE]: Do not log full webhook URLs or return raw upstream LLM response bodies #33

Type of Change

Bug fix
New feature
Breaking change
Documentation update

Checklist

I have read the CONTRIBUTING guide
My code follows the project's style
I have added tests for new or changed behaviour
All tests pass (pytest tests/ -v)
I have updated documentation if needed
Self-harm / safety-related changes have been reviewed for sensitivity

CLAassistant · 2026-05-29T15:24:53Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-05-29T15:25:02Z

Warning

Review limit reached

@Shayan-Bhowmik, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 54 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 963f5a5b-1f5c-4b3f-b742-efc0bdf85d2d

📥 Commits

Reviewing files that changed from the base of the PR and between 39ec464 and 941fa07.

📒 Files selected for processing (2)

humane_proxy/middleware/interceptor.py
tests/test_interceptor.py

📝 Walkthrough

Walkthrough

Sanitizes webhook URLs to host-only form and redacts response bodies in webhook dispatcher logs; adds upstream helper to return a redacted marker and modifies non-JSON upstream handling to log the condition but not return raw content; adds tests and bumps dependency minimums.

Changes

Sensitive Data Protection in Logging

Layer / File(s)	Summary
Webhook URL sanitization and logging updates `humane_proxy/escalation/webhooks.py`	Added `_sanitize_webhook_url()` and a response-redaction helper; updated `_post()` to log sanitized webhook host and response length/status instead of response bodies.
Upstream response body redaction in error handling `humane_proxy/middleware/interceptor.py`	Added `_redact_upstream_body()`; non-JSON upstream response path now reads upstream text, logs a warning (redacted), and returns a 502 error without embedding raw upstream content.
Webhook sanitization tests `tests/test_webhooks.py`	Imported `logging` and `_sanitize_webhook_url`; added tests that assert `_post()` redacts URL/token details in logs and that `_sanitize_webhook_url()` returns a host-only string for Slack URLs and unchanged for non-URLs.
Upstream non-JSON error handling tests `tests/test_interceptor.py`	New test class simulates upstream non-JSON response via a fake `httpx.AsyncClient` and asserts `/chat` returns 502 with no `raw` field and no leaked token text in logs.
Dependency version updates `pyproject.toml`, `requirements.txt`	Raised minimum versions for `fastapi`, `httpx`, `numpy`, `pyyaml`, `uvicorn`, `click`, `rich`, adjusted `webhooks` extra `httpx` and `crewai[tools]` constraints.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[FEATURE]: Do not log full webhook URLs or return raw upstream LLM response bodies #33: Directly related — same files/functions addressed to sanitize webhook URLs and avoid returning raw upstream LLM response bodies.

Poem

🐰 I hopped through logs with careful paws,

Snipped URLs to just their hosts and laws,
Hid upstream texts behind a gentle sign,
No tokens spilled, the trace is fine —
A tidy hop, secure and kind.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: webhook logging sanitization and preventing raw upstream bodies from being returned to clients.
Description check	✅ Passed	The description is directly related to the changeset, explaining the two security fixes addressed by the pull request and referencing issue `#33`.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Hardens logging and error responses to avoid leaking sensitive data from upstream LLM responses and webhook URLs/bodies.

Changes:

Strips raw upstream body from non-JSON error responses returned to clients and logs only redacted markers.
Sanitizes webhook URLs (host-only) and removes response body content from warning/exception logs.
Adds tests covering URL sanitization, redaction of webhook bodies, and non-JSON upstream handling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
humane_proxy/middleware/interceptor.py	Removes raw upstream body from error response; logs redacted warning instead.
humane_proxy/escalation/webhooks.py	Adds `_sanitize_webhook_url` and replaces logged URL/body with sanitized/redacted forms.
tests/test_interceptor.py	New test verifying non-JSON upstream responses don't leak secrets in body or logs.
tests/test_webhooks.py	New tests for URL sanitization and redaction of webhook response logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def _redact_upstream_body(body: str, limit: int = 120) -> str:
+    """Return a redacted marker for internal logs."""
+    _ = (body, limit)
+    return "[redacted]"


+            raw_body = llm_response.text
+            logger.warning(
+                "Upstream returned non-JSON (HTTP %d); body redacted",
+                llm_response.status_code,
+            )


+def _redact_response_text(text: str, limit: int = 120) -> str:
+    """Return a redacted marker for logs."""
+    _ = limit
+    return "[redacted]"


+def _sanitize_webhook_url(url: str) -> str:
+    """Return a host-only representation of a webhook URL."""
+    parts = urlsplit(url)
+    if not parts.scheme or not parts.netloc:
+        return url


coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@humane_proxy/escalation/webhooks.py`:
- Around line 21-26: _sanitize_webhook_url currently returns parts.scheme +
parts.netloc which can include embedded userinfo and leak credentials; change it
to build the host from parts.hostname and parts.port (use parts.hostname
directly and append ":<port>" only if parts.port is not None), preserve the
scheme and return a urlunsplit or equivalent with path/query/fragment empty, and
if parts.hostname is None fall back to returning the original url to avoid
breaking malformed inputs; update the function _sanitize_webhook_url to use
parts.hostname/parts.port instead of parts.netloc.
- Around line 29-32: The helper _redact_response_text is dead code; either
remove it or use it to avoid logging raw response bodies from _post. Fix by (A)
deleting the unused function _redact_response_text if you don't intend to log
any response body, or (B) wiring it into the error logging in _post so that when
resp.status_code >= 400 you log a redacted snippet instead of the full
resp.text/len: call _redact_response_text(resp.text, limit) and include its
return value in the warning message (keep the existing resp.status_code check
and context). Ensure references are to the existing function name
_redact_response_text, the _post function, and resp.status_code/resp.text so
reviewers can locate the change.

In `@humane_proxy/middleware/interceptor.py`:
- Around line 160-168: Remove the unused read of the upstream body: delete the
assignment to raw_body = llm_response.text and any unused variable references in
the interceptor code path that handles non-JSON responses (the block referencing
llm_response and building body = {"status": "error", ...}); keep the existing
logger.warning that redacts the body and retain the constructed body message so
behavior is unchanged.
- Around line 74-77: _delete the unused helper _redact_upstream_body and remove
the unused local raw_body in the upstream non-JSON branch of interceptor.py:
remove the entire _redact_upstream_body function definition and in the upstream
non-JSON branch (where raw_body = llm_response.text is currently set) delete
that assignment and ensure the log/warning uses a concrete redacted marker (e.g.
the literal "[redacted]") instead of relying on the removed helper; reference
symbols: _redact_upstream_body, raw_body, and the upstream non-JSON branch in
humane_proxy/middleware/interceptor.py.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3680fd7c-dda5-460c-a8cf-de64675e02d0

📥 Commits

Reviewing files that changed from the base of the PR and between 1924099 and 408adb6.

📒 Files selected for processing (4)

humane_proxy/escalation/webhooks.py
humane_proxy/middleware/interceptor.py
tests/test_interceptor.py
tests/test_webhooks.py

coderabbitai · 2026-05-29T15:29:29Z

+def _sanitize_webhook_url(url: str) -> str:
+    """Return a host-only representation of a webhook URL."""
+    parts = urlsplit(url)
+    if not parts.scheme or not parts.netloc:
+        return url
+    return urlunsplit((parts.scheme, parts.netloc, "", "", ""))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

netloc can still leak embedded credentials.

urlsplit().netloc includes any userinfo component, so a URL like https://user:token@host/path sanitizes to https://user:token@host, leaking the secret you intend to redact. Use hostname (plus port) instead.

🛡️ Proposed fix

parts = urlsplit(url) if not parts.scheme or not parts.netloc: return url - return urlunsplit((parts.scheme, parts.netloc, "", "", "")) + host = parts.hostname or "" + netloc = f"{host}:{parts.port}" if parts.port else host + return urlunsplit((parts.scheme, netloc, "", "", ""))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/escalation/webhooks.py` around lines 21 - 26, _sanitize_webhook_url currently returns parts.scheme + parts.netloc which can include embedded userinfo and leak credentials; change it to build the host from parts.hostname and parts.port (use parts.hostname directly and append ":<port>" only if parts.port is not None), preserve the scheme and return a urlunsplit or equivalent with path/query/fragment empty, and if parts.hostname is None fall back to returning the original url to avoid breaking malformed inputs; update the function _sanitize_webhook_url to use parts.hostname/parts.port instead of parts.netloc.

coderabbitai · 2026-05-29T15:29:29Z

+def _redact_response_text(text: str, limit: int = 120) -> str:
+    """Return a redacted marker for logs."""
+    _ = limit
+    return "[redacted]"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm _redact_response_text is not referenced anywhere else. rg -nP '\b_redact_response_text\s*\('

Repository: Vishisht16/Humane-Proxy

Length of output: 169

Remove (or wire up) the dead _redact_response_text helper. In humane_proxy/escalation/webhooks.py (lines 29-32), _redact_response_text is never called anywhere in the repo; _post’s resp.status_code >= 400 warning logs only len(resp.text) and doesn’t redact using this helper. Delete it or use it where response-body redaction is intended.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/escalation/webhooks.py` around lines 29 - 32, The helper _redact_response_text is dead code; either remove it or use it to avoid logging raw response bodies from _post. Fix by (A) deleting the unused function _redact_response_text if you don't intend to log any response body, or (B) wiring it into the error logging in _post so that when resp.status_code >= 400 you log a redacted snippet instead of the full resp.text/len: call _redact_response_text(resp.text, limit) and include its return value in the warning message (keep the existing resp.status_code check and context). Ensure references are to the existing function name _redact_response_text, the _post function, and resp.status_code/resp.text so reviewers can locate the change.

coderabbitai · 2026-05-29T15:29:29Z

+def _redact_upstream_body(body: str, limit: int = 120) -> str:
+    """Return a redacted marker for internal logs."""
+    _ = (body, limit)
+    return "[redacted]"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm _redact_upstream_body is unused elsewhere. rg -nP '\b_redact_upstream_body\s*\('

Repository: Vishisht16/Humane-Proxy

Length of output: 172

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Find all references/usages across the repo (not just calls with parentheses) rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . # 2) Show the surrounding implementation and any potential call sites in the target file sed -n '1,120p' humane_proxy/middleware/interceptor.py sed -n '130,220p' humane_proxy/middleware/interceptor.py sed -n '220,320p' humane_proxy/middleware/interceptor.py

Repository: Vishisht16/Humane-Proxy

Length of output: 49

🏁 Script executed:

#!/bin/bash set -euo pipefail # Capture both stdout and stderr for the usage search echo "== References to _redact_upstream_body ==" rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true echo echo "== Lines around raw_body and error handling ==" nl -ba humane_proxy/middleware/interceptor.py | sed -n '120,210p'

Repository: Vishisht16/Humane-Proxy

Length of output: 219

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== References to _redact_upstream_body ==" rg -n --hidden --no-ignore-vcs '\bredact_upstream_body\b' . || true echo echo "== raw_body assignments ==" rg -n --hidden --no-ignore-vcs '\braw_body\b' humane_proxy/middleware/interceptor.py || true echo echo "== error-body handling region ==" rg -n --hidden --no-ignore-vcs 'error|exception|body|redact' humane_proxy/middleware/interceptor.py | head -n 80 || true echo echo "== Relevant file sections ==" echo "-- 1-120" cat -n humane_proxy/middleware/interceptor.py | sed -n '1,120p' echo "-- 120-220" cat -n humane_proxy/middleware/interceptor.py | sed -n '120,220p' echo "-- 220-320" cat -n humane_proxy/middleware/interceptor.py | sed -n '220,320p'

Repository: Vishisht16/Humane-Proxy

Length of output: 7789

Remove dead redaction helper and unused raw_body

_redact_upstream_body in humane_proxy/middleware/interceptor.py has no call sites (only its definition exists).

In the upstream non-JSON branch, raw_body = llm_response.text is assigned but never used; the warning claims the body is “redacted” even though the helper isn’t called.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/middleware/interceptor.py` around lines 74 - 77, _delete the unused helper _redact_upstream_body and remove the unused local raw_body in the upstream non-JSON branch of interceptor.py: remove the entire _redact_upstream_body function definition and in the upstream non-JSON branch (where raw_body = llm_response.text is currently set) delete that assignment and ensure the log/warning uses a concrete redacted marker (e.g. the literal "[redacted]") instead of relying on the removed helper; reference symbols: _redact_upstream_body, raw_body, and the upstream non-JSON branch in humane_proxy/middleware/interceptor.py.

coderabbitai · 2026-05-29T15:29:29Z

+            raw_body = llm_response.text
+            logger.warning(
+                "Upstream returned non-JSON (HTTP %d); body redacted",
+                llm_response.status_code,
+            )
            body = {
                "status": "error",
                "message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",
-                "raw": llm_response.text[:500],
            }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

raw_body is assigned but never used.

Reading llm_response.text into raw_body materializes the full upstream body without using it. The redaction goal is already met by logging only the status, so drop the unused read.

🧹 Proposed cleanup

except (ValueError, TypeError): - raw_body = llm_response.text logger.warning( "Upstream returned non-JSON (HTTP %d); body redacted", llm_response.status_code, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raw_body = llm_response.text

logger.warning(

"Upstream returned non-JSON (HTTP %d); body redacted",

llm_response.status_code,

)

body = {

"status": "error",

"message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",

"raw": llm_response.text[:500],

}

logger.warning(

"Upstream returned non-JSON (HTTP %d); body redacted",

llm_response.status_code,

)

body = {

"status": "error",

"message": f"Upstream returned non-JSON (HTTP {llm_response.status_code}).",

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@humane_proxy/middleware/interceptor.py` around lines 160 - 168, Remove the unused read of the upstream body: delete the assignment to raw_body = llm_response.text and any unused variable references in the interceptor code path that handles non-JSON responses (the block referencing llm_response and building body = {"status": "error", ...}); keep the existing logger.warning that redacts the body and retain the constructed body message so behavior is unchanged.

+
+            await _post("https://hooks.slack.com/services/T000/B000/SECRET?token=abc123", {"x": 1})
+
+        assert "hooks.slack.com" in caplog.text


Vishisht16 · 2026-05-31T12:59:05Z

@Shayan-Bhowmik your code introduced a new high severity vulnerability. CodeRabbit and Copilot have also flagged some specific lines in the code. Please look into it and let me know if you need any kind of help.

Shayan-Bhowmik · 2026-06-01T10:28:22Z

@Vishisht16 looking into it

Shayan-Bhowmik · 2026-06-01T10:53:30Z

@Vishisht16 I have resolved the merge conflicts and solve the vulnerablities, please review the code.

Santize Webhook Logging and Do not return raw upstream bodies to clients

408adb6

Copilot AI review requested due to automatic review settings May 29, 2026 15:24

Shayan-Bhowmik requested a review from Vishisht16 as a code owner May 29, 2026 15:24

Copilot AI reviewed May 29, 2026

View reviewed changes

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

github-advanced-security AI found potential problems May 31, 2026

View reviewed changes

Comment thread tests/test_webhooks.py

await _post("https://hooks.slack.com/services/T000/B000/SECRET?token=abc123", {"x": 1})

assert "hooks.slack.com" in caplog.text

Shayan-Bhowmik added 2 commits June 1, 2026 16:15

Repaired the higher vulnerabilities

39ec464

Merge Conflicts Solved

941fa07


		await _post("https://hooks.slack.com/services/T000/B000/SECRET?token=abc123", {"x": 1})

		assert "hooks.slack.com" in caplog.text

Uh oh!

Conversation

Shayan-Bhowmik commented May 29, 2026

Description

Type of Change

Checklist

Uh oh!

CLAassistant commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Vishisht16 commented May 31, 2026

Uh oh!

Shayan-Bhowmik commented Jun 1, 2026

Uh oh!

Shayan-Bhowmik commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coderabbitai Bot commented May 29, 2026 •

edited

Loading