Fix symlink path traversal in skill archive extraction#6235
Conversation
`_safe_extractall` (the Python < 3.12 fallback used by `crewai skills` archive unpacking) validated each member's *name* against the destination but never validated symlink/hardlink *targets*. A malicious skill tarball could plant a symlink escaping the destination (e.g. `link -> /home/user/.ssh`) followed by a regular member written through it (`link/authorized_keys`), escaping `dest` even though every member name resolves inside it — the classic symlink-extraction traversal. The 3.12+ path (`extractall(..., filter="data")`) already blocks this; the fallback now mirrors it by rejecting absolute link targets and any link target that resolves outside the destination directory. Adds regression tests covering absolute and relative escaping symlinks plus benign in-tree symlinks and ordinary archives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthrough
ChangesSymlink/hardlink path traversal hardening in
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR hardens the skill archive extraction path used on Python < 3.12 by extending _safe_extractall to validate symlink/hardlink targets (not just member names), preventing link-based path traversal that could otherwise enable writes outside the intended destination directory.
Changes:
- Add link-target validation for symlink and hardlink tar members in the Python < 3.12 extraction fallback.
- Add regression tests covering escaping and benign symlink scenarios, plus baseline extraction behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
lib/cli/src/crewai_cli/experimental/skills/main.py |
Strengthens _safe_extractall with link-target validation to prevent symlink/hardlink traversal on Python < 3.12. |
lib/cli/tests/skills/test_safe_extract.py |
Adds regression tests ensuring malicious escaping symlinks are rejected and benign archives still extract correctly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/cli/tests/skills/test_safe_extract.py (2)
70-88: 💤 Low valueConsider verifying the symlink was actually created.
The test confirms
real.txtextracts correctly but doesn't assert thatalias.txtexists or is a symlink. Adding an assertion would strengthen the test's coverage of the "allow" path for symlinks.assert (dest / "real.txt").read_bytes() == b"hi" + assert (dest / "alias.txt").is_symlink()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/cli/tests/skills/test_safe_extract.py` around lines 70 - 88, The test_allows_benign_relative_symlink test only verifies that the real.txt file was extracted with correct content but does not verify that the alias.txt symlink was actually created during extraction. Add assertions after the _safe_extractall call to verify that the alias.txt symlink exists at the destination path and confirm it is actually a symbolic link (using is_symlink() method) and that it points to the correct target (real.txt).
1-107: ⚡ Quick winConsider adding hardlink test coverage.
The implementation validates both
issym()andislnk()(hardlinks) with different anchor semantics, but only symlinks are tested. A hardlink escape test would confirm themember.islnk()branch (anchored to archive root) behaves correctly.def test_blocks_hardlink_escaping_destination(tmp_path: Path) -> None: """A hardlink whose target escapes dest is rejected.""" dest = tmp_path / "dest" dest.mkdir() def build(tf: tarfile.TarFile) -> None: link = tarfile.TarInfo("escape") link.type = tarfile.LNKTYPE link.linkname = "../outside.txt" # escapes archive root tf.addfile(link) with _tar_from_members(build) as tf: with pytest.raises(ValueError, match="escaping destination"): _safe_extractall(tf, dest)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/cli/tests/skills/test_safe_extract.py` around lines 1 - 107, Add a new test function named test_blocks_hardlink_escaping_destination after the existing symlink tests to validate that hardlinks with escaping targets are properly rejected. The test should create a tar archive with a hardlink member (using tarfile.LNKTYPE) whose linkname escapes the archive root, call _safe_extractall with the archive and destination path, and assert that a ValueError is raised with a message matching "escaping destination". This ensures the hardlink validation branch (member.islnk()) in the _safe_extractall function works correctly with its archive-root-based anchor semantics, matching the coverage already provided for symlinks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/cli/tests/skills/test_safe_extract.py`:
- Around line 70-88: The test_allows_benign_relative_symlink test only verifies
that the real.txt file was extracted with correct content but does not verify
that the alias.txt symlink was actually created during extraction. Add
assertions after the _safe_extractall call to verify that the alias.txt symlink
exists at the destination path and confirm it is actually a symbolic link (using
is_symlink() method) and that it points to the correct target (real.txt).
- Around line 1-107: Add a new test function named
test_blocks_hardlink_escaping_destination after the existing symlink tests to
validate that hardlinks with escaping targets are properly rejected. The test
should create a tar archive with a hardlink member (using tarfile.LNKTYPE) whose
linkname escapes the archive root, call _safe_extractall with the archive and
destination path, and assert that a ValueError is raised with a message matching
"escaping destination". This ensures the hardlink validation branch
(member.islnk()) in the _safe_extractall function works correctly with its
archive-root-based anchor semantics, matching the coverage already provided for
symlinks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 46717317-8e7d-4b5f-9816-ac6bf661dc83
📒 Files selected for processing (2)
lib/cli/src/crewai_cli/experimental/skills/main.pylib/cli/tests/skills/test_safe_extract.py
Summary
_safe_extractall— the Python < 3.12 fallback used bycrewai skillswhen unpacking a downloaded skill archive (_unpack_archive) — validated each tar member's name against the destination directory but never validated symlink / hardlink targets.Because the project supports Python
>=3.10, this fallback runs on real installs (3.10 / 3.11). The 3.12+ path usestarfile.extractall(..., filter="data"), which already blocks malicious links; the fallback did not provide the equivalent guard.Impact
A malicious skill tarball can:
dest(e.g.link -> /home/user/.ssh), thenlink/authorized_keys).Every member name resolves inside
dest, so the old check passed — but extraction creates the symlink and then writes through it, landing the payload outsidedest. Classic symlink-extraction path traversal / arbitrary file write from an externally-sourced archive.Fix
_safe_extractallnow, for symlink/hardlink members, rejects:(anchored correctly: hardlink names are relative to the archive root, symlink targets relative to the member's own directory). This mirrors the
filter="data"protection on Python 3.12+.The zip fallback (
_safe_extract_zip) is unaffected —zipfile.extractalldoes not recreate stored symlinks as links.Tests
lib/cli/tests/skills/test_safe_extract.py:dest../../) escaping symlink → rejectedAll 4 pass; the two malicious-link tests fail against the pre-patch code, confirming they are genuine regression guards.
Summary by CodeRabbit
Bug Fixes
Tests