Fix SSRF redirect bypass in scraping fetches by theCyberTech · Pull Request #6331 · crewAIInc/crewAI

theCyberTech · 2026-06-25T04:04:00Z

Summary

Fixes the worker-side SSRF redirect bypass in scraping and URL-backed RAG loaders by routing direct URL fetches through a redirect-aware safe HTTP helper.

Changes

Adds safe_get() to validate the initial URL and every redirect target before following redirects.
Replaces direct worker-side requests.get() calls in ScrapeWebsiteTool, ScrapeElementFromWebsiteTool, and URL-backed RAG loaders.
Adds regression tests proving a public-looking URL that redirects to 127.0.0.1 is blocked before the second request is made.

Validation

uv run pytest lib/crewai-tools/tests/utilities/test_safe_requests.py lib/crewai-tools/tests/utilities/test_safe_path.py lib/crewai-tools/tests/rag/test_webpage_loader.py lib/crewai-tools/tests/rag/test_docx_loader.py lib/crewai-tools/tests/rag/test_json_loader.py lib/crewai-tools/tests/rag/test_csv_loader.py lib/crewai-tools/tests/rag/test_xml_loader.py lib/crewai-tools/tests/rag/test_mdx_loader.py -q → 97 passed
uv run ruff check ... on touched files → passed
uv run mypy ... on touched files → passed

Notes

This closes the redirect-following bypass for worker-side direct fetches. Full DNS TOCTOU/IP pinning remains a separate lower-level hardening follow-up.

Summary by CodeRabbit

Security
- Web content, PDF, DOCX, and documentation-site downloads now use safer URL fetching to reduce the risk of unsafe redirects and restricted-network access.
- Redirects are handled with stricter safety checks, including limiting redirect chains and sanitizing credential-related data on cross-origin redirects.
Bug Fixes
- Improved reliability and more consistent failure behavior when fetching remote content and following redirects.
Tests
- Expanded coverage for redirect safety and credential/cookie handling during redirects.

coderabbitai · 2026-06-25T04:04:43Z

📝 Walkthrough

Walkthrough

The PR adds safe_get, a redirect-aware HTTP GET helper, and switches several loaders and website scraping tools to use it. It also adds DNS fixtures and tests covering private IP rejection, redirect validation, safe redirects, and redirect limits.

Changes

Redirect-safe HTTP fetching

Layer / File(s)	Summary
safe_get redirect handling `lib/crewai-tools/src/crewai_tools/security/safe_requests.py`	`safe_get` validates the initial URL, disables automatic redirects, validates redirect targets, caps redirect hops, and returns responses with redirect history.
RAG loader fetch calls `lib/crewai-tools/src/crewai_tools/rag/loaders/docs_site_loader.py`, `lib/crewai-tools/src/crewai_tools/rag/loaders/docx_loader.py`, `lib/crewai-tools/src/crewai_tools/rag/loaders/pdf_loader.py`, `lib/crewai-tools/src/crewai_tools/rag/loaders/utils.py`, `lib/crewai-tools/src/crewai_tools/rag/loaders/webpage_loader.py`	The docs site, DOCX, PDF, URL utility, and webpage loader modules import `safe_get` and replace direct `requests.get` calls while keeping parsing, temporary-file, and error-handling flows unchanged.
Website scraping tools `lib/crewai-tools/src/crewai_tools/tools/scrape_element_from_website/scrape_element_from_website.py`, `lib/crewai-tools/src/crewai_tools/tools/scrape_website_tool/scrape_website_tool.py`	The website scraping tools import `safe_get` and replace the prior `validate_url` plus `requests.get` fetch path in `_run` before HTML parsing and element extraction continue.
safe_get tests and DNS fixture `lib/crewai-tools/tests/rag/conftest.py`, `lib/crewai-tools/tests/utilities/test_safe_requests.py`	The test fixture pins DNS for example hosts, and the new tests cover direct private-IP rejection, redirect target validation, safe relative redirects, and redirect limit failures.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly matches the main security fix: blocking SSRF redirect bypasses in scraping-related fetches.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/scrape-redirect-ssrf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

corridor-security

Security Issues

Credential leakage across redirects
The new safe_get() manually follows redirects but reuses the original headers and cookies for each redirected host. A user-controlled URL or open redirect can cause Authorization/Cookie values intended for one origin to be sent to an attacker-controlled public host.

Summary: This PR adds redirect-aware URL validation for scraping and URL-backed RAG fetches, reducing the direct SSRF redirect bypass but introducing credential forwarding risk during redirects.

Risk: Medium risk. The new public URL fetch helper crosses an external trust boundary and can leak configured authentication headers or cookies when following attacker-controlled cross-origin redirects.

Recommendations:

Strip Authorization, Cookie, and explicit cookies when the redirect target changes origin, or reject cross-origin redirects whenever credentials are present.
Add regression tests for redirects from an authenticated trusted origin to a different public host to ensure credentials are not forwarded.

Copilot

Pull request overview

This PR addresses an SSRF redirect bypass in worker-side scraping and URL-backed RAG loaders by introducing a redirect-aware HTTP helper that re-validates each redirect target before issuing the next request.

Changes:

Added safe_get() in crewai_tools.security.safe_requests to validate the initial URL and each redirect target (manual redirect handling with allow_redirects=False).
Replaced direct requests.get() usage with safe_get() across scraping tools and multiple RAG loaders.
Added regression tests and test DNS fixtures to ensure redirects to internal IPs are blocked before the second request.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
lib/crewai-tools/src/crewai_tools/security/safe_requests.py	Adds `safe_get()` to validate URLs across redirects.
lib/crewai-tools/src/crewai_tools/tools/scrape_website_tool/scrape_website_tool.py	Routes website fetch through `safe_get()` instead of `requests.get()`.
lib/crewai-tools/src/crewai_tools/tools/scrape_element_from_website/scrape_element_from_website.py	Routes element-scrape fetch through `safe_get()` instead of `requests.get()`.
lib/crewai-tools/src/crewai_tools/rag/loaders/webpage_loader.py	Uses `safe_get()` for webpage loading.
lib/crewai-tools/src/crewai_tools/rag/loaders/utils.py	Uses `safe_get()` for shared URL-loading helper.
lib/crewai-tools/src/crewai_tools/rag/loaders/pdf_loader.py	Uses `safe_get()` for PDF URL downloads.
lib/crewai-tools/src/crewai_tools/rag/loaders/docx_loader.py	Uses `safe_get()` for DOCX URL downloads.
lib/crewai-tools/src/crewai_tools/rag/loaders/docs_site_loader.py	Uses `safe_get()` for docs-site fetching.
lib/crewai-tools/tests/utilities/test_safe_requests.py	Adds redirect-focused SSRF regression tests for `safe_get()`.
lib/crewai-tools/tests/rag/conftest.py	Adds autouse DNS fixture to stabilize URL validation in RAG tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai-tools/src/crewai_tools/rag/loaders/utils.py`:
- Line 37: The redirect handling in safe_get currently carries the original
headers across every hop, which can leak secrets to different domains. Update
safe_get in safe_requests.py to track the previous and next URL netlocs during
redirect resolution, and when the host changes, strip sensitive request headers
before the next requests.get call. Make sure the header cleanup covers
Authorization, Authorization-*, Cookie, and any other secret-bearing custom
headers used by callers such as the RAG loaders and scrape tools.

In `@lib/crewai-tools/src/crewai_tools/security/safe_requests.py`:
- Around line 19-25: The redirect handling in safe_requests.py is reusing
request_kwargs unchanged across every hop, which can leak Authorization/Cookie
state and ignore Set-Cookie responses. Update the redirect flow in the request
loop to use a requests.Session, or otherwise rebuild per-hop kwargs in a way
that strips sensitive headers on cross-origin redirects and carries cookies
forward safely; keep the logic localized around the current_url loop and the
request_kwargs/history/redirects_followed handling.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1c30100e-5382-43c6-bf0a-38da30d36328

📥 Commits

Reviewing files that changed from the base of the PR and between 01fc389 and cd3fa72.

📒 Files selected for processing (10)

lib/crewai-tools/src/crewai_tools/rag/loaders/docs_site_loader.py
lib/crewai-tools/src/crewai_tools/rag/loaders/docx_loader.py
lib/crewai-tools/src/crewai_tools/rag/loaders/pdf_loader.py
lib/crewai-tools/src/crewai_tools/rag/loaders/utils.py
lib/crewai-tools/src/crewai_tools/rag/loaders/webpage_loader.py
lib/crewai-tools/src/crewai_tools/security/safe_requests.py
lib/crewai-tools/src/crewai_tools/tools/scrape_element_from_website/scrape_element_from_website.py
lib/crewai-tools/src/crewai_tools/tools/scrape_website_tool/scrape_website_tool.py
lib/crewai-tools/tests/rag/conftest.py
lib/crewai-tools/tests/utilities/test_safe_requests.py

coderabbitai

♻️ Duplicate comments (1)

lib/crewai-tools/src/crewai_tools/security/safe_requests.py (1)

38-48: 🔒 Security & Privacy | 🟠 Major

Strip non-header credentials before the next hop.

auth remains in request_kwargs after header/cookie stripping, causing credentials to be sent to cross-origin redirect targets. params is also replayed to redirect targets, which can leak initial-query secrets that should not persist across hops.

Remove auth on all redirect hops, and drop first-hop params before following any redirect.

 def _strip_cross_origin_credentials(request_kwargs: dict[str, Any]) -> dict[str, Any]:
     sanitized = {**request_kwargs}
     headers = sanitized.get("headers")
     if headers:
         sanitized["headers"] = {
             key: value
             for key, value in headers.items()
             if not _is_sensitive_header(str(key))
         }
     sanitized.pop("cookies", None)
+    sanitized.pop("auth", None)
     return sanitized

-        if not _same_origin(current_url, redirect_url):
-            request_kwargs = _strip_cross_origin_credentials(request_kwargs)
+        next_request_kwargs = {**request_kwargs}
+        next_request_kwargs.pop("params", None)
+        if not _same_origin(current_url, redirect_url):
+            next_request_kwargs = _strip_cross_origin_credentials(next_request_kwargs)
+        request_kwargs = next_request_kwargs

Applies to: 83–84 as well.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-tools/src/crewai_tools/security/safe_requests.py` around lines 38
- 48, Update _strip_cross_origin_credentials in safe_requests.py so it also
removes non-header credentials before forwarding a redirected request: strip
auth on every cross-origin hop, and clear params as well as cookies so initial
query secrets are not replayed. Keep the existing header filtering logic, but
make sure the sanitized request_kwargs returned by
_strip_cross_origin_credentials no longer contains auth or params when building
the next request.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@lib/crewai-tools/src/crewai_tools/security/safe_requests.py`:
- Around line 38-48: Update _strip_cross_origin_credentials in safe_requests.py
so it also removes non-header credentials before forwarding a redirected
request: strip auth on every cross-origin hop, and clear params as well as
cookies so initial query secrets are not replayed. Keep the existing header
filtering logic, but make sure the sanitized request_kwargs returned by
_strip_cross_origin_credentials no longer contains auth or params when building
the next request.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e695953c-0583-47eb-809c-da84e2c9478a

📥 Commits

Reviewing files that changed from the base of the PR and between cd3fa72 and 7455e51.

📒 Files selected for processing (2)

lib/crewai-tools/src/crewai_tools/security/safe_requests.py
lib/crewai-tools/tests/utilities/test_safe_requests.py

corridor-security

Security Issues

Server-Side Request Forgery (SSRF) via DNS Rebinding (TOCTOU)
The new safe_get() function validates the hostname/IP via validate_url() but then performs the actual HTTP request with a fresh DNS resolution (requests.get(current_url, ...)). An attacker controlling DNS for a supplied URL can return a public IP during validation and then immediately rebind to a private/reserved IP for the actual connection, enabling internal network access (e.g., 127.0.0.1, 169.254.169.254). This time-of-check/time-of-use gap exists for the initial request and each redirect hop, allowing a realistic SSRF bypass despite validation.

Recommendations:

Pin the resolved IP address from validate_url() and perform the connection to that IP while setting the Host header (and SNI for TLS) to the original hostname, or
Use a custom requests/urllib3 adapter that enforces a trusted DNS resolution result for the lifetime of the request and redirect chain, rejecting any re-resolution to private/reserved ranges.
Maintain the same pinned-IP set across redirects (re-validate targets and re-pin before requesting), or abort on cross-origin redirects that cannot be safely pinned.

corridor-security · 2026-06-25T18:06:31Z

+    redirects_followed = 0
+
+    while True:
+        response = requests.get(current_url, timeout=timeout, **request_kwargs)


The validation step checks the URL against private/reserved IP ranges, but the subsequent request performs a new DNS resolution, creating a TOCTOU window for DNS rebinding that can re-route the connection to an internal address.

current_url = validate_url(url) ... response = requests.get(current_url, timeout=timeout, **request_kwargs)

An attacker who controls the DNS for a supplied URL can return a public IP at validation time and then immediately rebind to 127.0.0.1 or another blocked IP for the actual connection. This enables SSRF into internal services despite the validation.

Remediation:

Pin the resolved IP from validate_url() and perform the request to that IP while setting the Host header (and SNI for TLS) to the original hostname, or

Mount a custom urllib3/requests adapter that uses a trusted DNS resolution result for the entire request (including redirects) and refuses connections if the resolved IP falls into a private/reserved range.

Apply the same IP pinning logic to each validated redirect target before following it; abort when safe pinning cannot be maintained.

For more details, see the finding in Corridor.

Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.

Validate redirects for scraping URL fetches

cd3fa72

Copilot AI review requested due to automatic review settings June 25, 2026 04:04

github-actions Bot added the size/M label Jun 25, 2026

Copilot started reviewing on behalf of theCyberTech June 25, 2026 04:04 View session

corridor-security Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread lib/crewai-tools/src/crewai_tools/security/safe_requests.py

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread lib/crewai-tools/src/crewai_tools/security/safe_requests.py

Comment thread lib/crewai-tools/src/crewai_tools/rag/loaders/docs_site_loader.py

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread lib/crewai-tools/src/crewai_tools/rag/loaders/utils.py

Comment thread lib/crewai-tools/src/crewai_tools/security/safe_requests.py

Prevent credential forwarding across redirects

7455e51

github-actions Bot added size/L and removed size/M labels Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Merge branch 'main' into fix/scrape-redirect-ssrf

c810ff2

corridor-security Bot reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SSRF redirect bypass in scraping fetches#6331

Fix SSRF redirect bypass in scraping fetches#6331
theCyberTech wants to merge 3 commits into
mainfrom
fix/scrape-redirect-ssrf

theCyberTech commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

corridor-security Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

corridor-security Bot left a comment

Uh oh!

corridor-security Bot Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

theCyberTech commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

corridor-security Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

corridor-security Bot left a comment

Choose a reason for hiding this comment

Uh oh!

corridor-security Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

theCyberTech commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading