feat(skills): harden the file upload validation section in django-security#2338
feat(skills): harden the file upload validation section in django-security#2338jvirgovic wants to merge 4 commits into
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe File Upload Security section in ChangesFile Upload Security Guidance Update
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@skills/django-security/SKILL.md`:
- Around line 409-419: The validate_file_type function reads magic bytes from
the current cursor position and only resets afterward, which fails if the file
has already been read elsewhere. Capture the original file position using tell()
at the start of the function, seek to position 0 before reading the buffer for
mime type detection, and restore the original file offset after validation using
a finally block to ensure the position is always restored regardless of whether
validation succeeds or raises an error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 73f9ced7-27ad-452e-8d6c-cb7a1a1e2e68
📒 Files selected for processing (1)
skills/django-security/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout. (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
skills/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Skills should be formatted as Markdown with clear sections for When to Use, How It Works, and Examples.
Files:
skills/django-security/SKILL.md
{agents,skills,commands}/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use lowercase filenames with hyphens (e.g.,
python-reviewer.md,tdd-workflow.md) for agents, skills, and commands.
Files:
skills/django-security/SKILL.md
{skills,commands,agents,rules}/**
⚙️ CodeRabbit configuration file
{skills,commands,agents,rules}/**: Focus on prompt-injection resilience, tool-permission scope, destructive action guards, and secret exfiltration risks.
Files:
skills/django-security/SKILL.md
🧠 Learnings (2)
📚 Learning: 2026-03-15T19:02:43.245Z
Learnt from: imrobinsingh
Repo: affaan-m/everything-claude-code PR: 503
File: skills/data-scraper-agent/SKILL.md:1-748
Timestamp: 2026-03-15T19:02:43.245Z
Learning: In this repository, skill folders should use a lowercase-hyphen name (e.g., data-scraper-agent, claude-api) and the skill description file inside each folder should be named SKILL.md (uppercase). Do not flag SKILL.md as a naming violation; treat SKILL.md as the canonical file name inside each skill directory.
Applied to files:
skills/django-security/SKILL.md
📚 Learning: 2026-04-15T15:52:59.963Z
Learnt from: manja316
Repo: affaan-m/everything-claude-code PR: 1360
File: skills/security-bounty-hunter/SKILL.md:11-18
Timestamp: 2026-04-15T15:52:59.963Z
Learning: In this repository’s skills documentation (skills/**/SKILL.md), use the canonical auto-activation skill section header `## When to Activate`—do not use `## When to Use`. CONTRIBUTING.md and docs/SKILL-DEVELOPMENT-GUIDE.md confirm the required header, and existing skills follow this convention. This header is important for the auto-activation mechanism to detect the correct section.
Applied to files:
skills/django-security/SKILL.md
🪛 SkillSpector (2.2.3)
skills/django-security/SKILL.md
[error] 50: [E2] Env Variable Harvesting: Code accesses environment variables that may contain secrets (API keys, tokens). This is a common pattern for credential theft.
Remediation: Avoid reading sensitive env vars (API keys, tokens) unless strictly required. Use secrets managers or secure config. Never log or transmit credentials.
(Data Exfiltration (E2))
[error] 566: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
[error] 573: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
| def validate_file_type(value): | ||
| """Validate file type using magic bytes and cross-check extension.""" | ||
| mime = magic.from_buffer(value.read(2048), mime=True) | ||
| value.seek(0) | ||
|
|
||
| if mime not in ALLOWED_MIMES: | ||
| raise ValidationError(f'Unsupported file type: {mime}') | ||
|
|
||
| ext = os.path.splitext(value.name)[1].lower() | ||
| if ext not in MIME_TO_EXTENSIONS.get(mime, set()): | ||
| raise ValidationError('File extension does not match file content.') |
There was a problem hiding this comment.
Read from the start, then restore the original file position.
Both validator variants sniff from the current cursor position and only rewind afterward. If anything has already read the upload, this validates the wrong bytes and can reject a valid file. Capture tell(), seek to 0 before sniffing, and restore the original offset in a finally block.
✅ Proposed fix
def validate_file_type(value):
"""Validate file type using magic bytes and cross-check extension."""
- mime = magic.from_buffer(value.read(2048), mime=True)
- value.seek(0)
+ pos = value.tell()
+ try:
+ value.seek(0)
+ mime = magic.from_buffer(value.read(2048), mime=True)
+ finally:
+ value.seek(pos)Also applies to: 441-451
🧰 Tools
🪛 SkillSpector (2.2.3)
[error] 50: [E2] Env Variable Harvesting: Code accesses environment variables that may contain secrets (API keys, tokens). This is a common pattern for credential theft.
Remediation: Avoid reading sensitive env vars (API keys, tokens) unless strictly required. Use secrets managers or secure config. Never log or transmit credentials.
(Data Exfiltration (E2))
[error] 566: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
[error] 573: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@skills/django-security/SKILL.md` around lines 409 - 419, The
validate_file_type function reads magic bytes from the current cursor position
and only resets afterward, which fails if the file has already been read
elsewhere. Capture the original file position using tell() at the start of the
function, seek to position 0 before reading the buffer for mime type detection,
and restore the original file offset after validation using a finally block to
ensure the position is always restored regardless of whether validation succeeds
or raises an error.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
skills/django-security/SKILL.md (2)
409-420:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFile position must be captured, preserved, and restored in a finally block.
The validator reads magic bytes from the current cursor position (line 411) and only seeks back to 0 afterward (line 412). If the file has been read elsewhere in the upload pipeline, the cursor will not be at position 0, and this code will sniff the wrong bytes. This can cause valid files to be rejected or (worse) bypass validation.
Capture the original position using
tell(), seek to 0 before sniffing, and unconditionally restore the original position in afinallyblock.🔧 Proposed fix
def validate_file_type(value): """Validate file type using magic bytes and cross-check extension.""" - mime = magic.from_buffer(value.read(2048), mime=True) - value.seek(0) + pos = value.tell() + try: + value.seek(0) + mime = magic.from_buffer(value.read(2048), mime=True) + finally: + value.seek(pos) if mime not in ALLOWED_MIMES:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/django-security/SKILL.md` around lines 409 - 420, The validate_file_type function must preserve the original file cursor position since it may not be at position 0 when called. Capture the original position at the start of the function using the tell() method, then seek to position 0 before reading the magic bytes. Wrap the validation logic in a try-finally block that unconditionally restores the original position in the finally clause, ensuring the cursor is restored even if an exception occurs during validation.
393-407:⚠️ Potential issue | 🟡 MinorAdd "When to Activate" subsection to File Upload Security section.
The File Upload Security section (line 389) lacks a
### When to Activatesubsection, jumping directly to### File Validation. Add a brief subsection describing when file upload validation should be applied to align with the skill's auto-activation mechanism documented at line 12.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/django-security/SKILL.md` around lines 393 - 407, The File Upload Security section is missing a "When to Activate" subsection that should appear before the "File Validation" subsection. Add a "### When to Activate" heading after the main section header for File Upload Security and include a brief description of when file upload validation should be applied to the application. Reference and align this activation guidance with the skill's auto-activation mechanism documented earlier in the document (at line 12) to maintain consistency with the document structure.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@skills/django-security/SKILL.md`:
- Around line 409-420: The validate_file_type function must preserve the
original file cursor position since it may not be at position 0 when called.
Capture the original position at the start of the function using the tell()
method, then seek to position 0 before reading the magic bytes. Wrap the
validation logic in a try-finally block that unconditionally restores the
original position in the finally clause, ensuring the cursor is restored even if
an exception occurs during validation.
- Around line 393-407: The File Upload Security section is missing a "When to
Activate" subsection that should appear before the "File Validation" subsection.
Add a "### When to Activate" heading after the main section header for File
Upload Security and include a brief description of when file upload validation
should be applied to the application. Reference and align this activation
guidance with the skill's auto-activation mechanism documented earlier in the
document (at line 12) to maintain consistency with the document structure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: e788b6ae-41e4-4573-bdcf-a404be9fb6dc
📒 Files selected for processing (1)
skills/django-security/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout. (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
skills/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Skills should be formatted as Markdown with clear sections for When to Use, How It Works, and Examples.
Files:
skills/django-security/SKILL.md
{agents,skills,commands}/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use lowercase filenames with hyphens (e.g.,
python-reviewer.md,tdd-workflow.md) for agents, skills, and commands.
Files:
skills/django-security/SKILL.md
{skills,commands,agents,rules}/**
⚙️ CodeRabbit configuration file
{skills,commands,agents,rules}/**: Focus on prompt-injection resilience, tool-permission scope, destructive action guards, and secret exfiltration risks.
Files:
skills/django-security/SKILL.md
🧠 Learnings (2)
📚 Learning: 2026-03-15T19:02:43.245Z
Learnt from: imrobinsingh
Repo: affaan-m/everything-claude-code PR: 503
File: skills/data-scraper-agent/SKILL.md:1-748
Timestamp: 2026-03-15T19:02:43.245Z
Learning: In this repository, skill folders should use a lowercase-hyphen name (e.g., data-scraper-agent, claude-api) and the skill description file inside each folder should be named SKILL.md (uppercase). Do not flag SKILL.md as a naming violation; treat SKILL.md as the canonical file name inside each skill directory.
Applied to files:
skills/django-security/SKILL.md
📚 Learning: 2026-04-15T15:52:59.963Z
Learnt from: manja316
Repo: affaan-m/everything-claude-code PR: 1360
File: skills/security-bounty-hunter/SKILL.md:11-18
Timestamp: 2026-04-15T15:52:59.963Z
Learning: In this repository’s skills documentation (skills/**/SKILL.md), use the canonical auto-activation skill section header `## When to Activate`—do not use `## When to Use`. CONTRIBUTING.md and docs/SKILL-DEVELOPMENT-GUIDE.md confirm the required header, and existing skills follow this convention. This header is important for the auto-activation mechanism to detect the correct section.
Applied to files:
skills/django-security/SKILL.md
🪛 SkillSpector (2.2.3)
skills/django-security/SKILL.md
[error] 50: [E2] Env Variable Harvesting: Code accesses environment variables that may contain secrets (API keys, tokens). This is a common pattern for credential theft.
Remediation: Avoid reading sensitive env vars (API keys, tokens) unless strictly required. Use secrets managers or secure config. Never log or transmit credentials.
(Data Exfiltration (E2))
[error] 566: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
[error] 573: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
🔇 Additional comments (1)
skills/django-security/SKILL.md (1)
435-451: The fallback implementation is consistent but lacks proper runtime gating.Both
validate_file_typefunctions apply identical validation logic:
- File position preservation (read + seek)
ALLOWED_MIMESenforcement- Extension verification against
MIME_TO_EXTENSIONSHowever, the code shows these as separate documentation examples rather than a runtime fallback pattern. There's no try-import or except block that would automatically fall back from
magictofiletypeon ImportError. Developers must manually choose which implementation to use; they won't get automatic fallback behavior. Either wire a try-except pattern around the imports or clarify that this is a manual swap, not an automatic failover.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
skills/django-security/SKILL.md (1)
409-419:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPreserve the original file cursor before sniffing.
Both snippets still read from the current position and only rewind afterward, so any prior read can make validation inspect the wrong bytes and reject a valid upload. Capture
tell(), seek to0before sniffing, and restore the original offset infinally.🔧 Proposed fix
def validate_file_type(value): """Validate file type using magic bytes and cross-check extension.""" - mime = magic.from_buffer(value.read(2048), mime=True) - value.seek(0) + pos = value.tell() + try: + value.seek(0) + mime = magic.from_buffer(value.read(2048), mime=True) + finally: + value.seek(pos)Also applies to: 455-465
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/django-security/SKILL.md` around lines 409 - 419, The validate_file_type function reads from the current file position and only rewinds to position 0 afterward, which fails to preserve the original cursor position if the file was already read. Capture the current position using value.tell() before reading, seek to position 0 before performing the magic bytes check, and use a finally block to restore the original cursor position after the validation completes to ensure subsequent operations access the file from the correct offset.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@skills/django-security/SKILL.md`:
- Around line 409-419: The validate_file_type function reads from the current
file position and only rewinds to position 0 afterward, which fails to preserve
the original cursor position if the file was already read. Capture the current
position using value.tell() before reading, seek to position 0 before performing
the magic bytes check, and use a finally block to restore the original cursor
position after the validation completes to ensure subsequent operations access
the file from the correct offset.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5415e11e-de23-4cb8-bc1b-7b689db4055c
📒 Files selected for processing (1)
skills/django-security/SKILL.md
📜 Review details
⏰ Context from checks skipped due to timeout. (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
skills/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Skills should be formatted as Markdown with clear sections for When to Use, How It Works, and Examples.
Files:
skills/django-security/SKILL.md
{agents,skills,commands}/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use lowercase filenames with hyphens (e.g.,
python-reviewer.md,tdd-workflow.md) for agents, skills, and commands.
Files:
skills/django-security/SKILL.md
{skills,commands,agents,rules}/**
⚙️ CodeRabbit configuration file
{skills,commands,agents,rules}/**: Focus on prompt-injection resilience, tool-permission scope, destructive action guards, and secret exfiltration risks.
Files:
skills/django-security/SKILL.md
🧠 Learnings (2)
📚 Learning: 2026-03-15T19:02:43.245Z
Learnt from: imrobinsingh
Repo: affaan-m/everything-claude-code PR: 503
File: skills/data-scraper-agent/SKILL.md:1-748
Timestamp: 2026-03-15T19:02:43.245Z
Learning: In this repository, skill folders should use a lowercase-hyphen name (e.g., data-scraper-agent, claude-api) and the skill description file inside each folder should be named SKILL.md (uppercase). Do not flag SKILL.md as a naming violation; treat SKILL.md as the canonical file name inside each skill directory.
Applied to files:
skills/django-security/SKILL.md
📚 Learning: 2026-04-15T15:52:59.963Z
Learnt from: manja316
Repo: affaan-m/everything-claude-code PR: 1360
File: skills/security-bounty-hunter/SKILL.md:11-18
Timestamp: 2026-04-15T15:52:59.963Z
Learning: In this repository’s skills documentation (skills/**/SKILL.md), use the canonical auto-activation skill section header `## When to Activate`—do not use `## When to Use`. CONTRIBUTING.md and docs/SKILL-DEVELOPMENT-GUIDE.md confirm the required header, and existing skills follow this convention. This header is important for the auto-activation mechanism to detect the correct section.
Applied to files:
skills/django-security/SKILL.md
🪛 SkillSpector (2.2.3)
skills/django-security/SKILL.md
[error] 50: [E2] Env Variable Harvesting: Code accesses environment variables that may contain secrets (API keys, tokens). This is a common pattern for credential theft.
Remediation: Avoid reading sensitive env vars (API keys, tokens) unless strictly required. Use secrets managers or secure config. Never log or transmit credentials.
(Data Exfiltration (E2))
[error] 580: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
[error] 587: [PE3] Credential Access: Code accesses credential files (SSH keys, AWS credentials, etc.). This could indicate credential theft attempts.
Remediation: Remove references to credential paths. Use environment variables or secrets managers. For docs, use placeholder paths (e.g., /path/to/config). Never load .env or token files in production code paths.
(Privilege Escalation (PE3))
daltino
left a comment
There was a problem hiding this comment.
This PR strengthens the file upload validation logic in django-security by moving from validating file extensions to MIME type validation using the python-magic library. This is a robust improvement since MIME types are harder to spoof than file extensions. The ALLOWED_MIMES and MIME_TO_EXTENSIONS mappings are a good addition for better control and clarity. The changes themselves are concise and align well with the goal of enhancing security.
What Changed
Replaced the insecure extension-only file validation in the Django Security skill with magic byte validation using python-magic (with a pure-Python filetype alternative). The validator now reads file signatures to determine actual MIME type and cross-checks against the declared extension.
Why This Change
Extension-based file validation is trivially bypassed by renaming files. Magic byte inspection verifies the actual file content, providing meaningful upload security rather than a false sense of safety.
Testing Done
node tests/run-all.js)Type of Change
fix:Bug fixfeat:New featurerefactor:Code refactoringdocs:Documentationtest:Testschore:Maintenance/toolingci:CI/CD changesSecurity & Quality Checklist
Documentation