Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 30 additions & 8 deletions skills/ai-security/prompt-injection/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ name: prompt-injection
description: >
Tests LLM applications for prompt injection vulnerabilities per OWASP LLM01:2025.
Covers direct injection (user input manipulating model behavior) and indirect
injection (external content containing hidden instructions). Auto-invoked when
injection (external content containing hidden instructions), as well as multimodal injection (images/audio). Auto-invoked when
reviewing LLM applications that process external content, build RAG pipelines,
or accept user input that reaches a language model. Produces a test report with
categorized findings and defense recommendations.
tags: [ai-security, prompt-injection, llm, testing]
tags: [ai-security, prompt-injection, llm, testing, multimodal]
role: [appsec-engineer, security-engineer]
phase: [build, review, operate]
frameworks: [OWASP-LLM01-2025, MITRE-ATLAS]
Expand Down Expand Up @@ -46,6 +46,8 @@ The research community distinguishes two fundamental variants:
- **Direct prompt injection** — The attacker's malicious instructions are submitted directly as user input to the application. First systematically studied by Perez & Ribeiro (2022) in "Ignore Previous Prompt: Attack Techniques For Language Models," this class covers cases where user-controlled text is concatenated into the prompt sent to the LLM.

- **Indirect prompt injection** — The attacker plants malicious instructions in external content that the LLM later retrieves and processes. Greshake et al. (2023) formalized this in "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection," demonstrating that poisoned web pages, documents, and emails can hijack LLM behavior when ingested as context.

- **Multimodal prompt injection** — The attacker embeds malicious instructions into non-text modalities, such as adversarial noise in audio files or hidden white-on-white text in images, which are directly parsed by multimodal models (e.g., GPT-4o, Gemini Vision) to execute unauthorized instructions.

Simon Willison's prompt injection taxonomy further refines these categories by documenting real-world attack surfaces and defense limitations, providing practical grounding for security assessments.

Expand All @@ -55,7 +57,7 @@ Simon Willison's prompt injection taxonomy further refines these categories by d

Identify every point where user-supplied or externally sourced content reaches the language model. Produce a complete interaction map covering:

1. **User input channels** — Chat interfaces, form fields, API parameters, file uploads, voice input transcriptions, and any other path where a user directly provides text that is included in an LLM prompt.
1. **User input channels** — Chat interfaces, form fields, API parameters, file/image uploads, audio/voice input, and any other path where a user directly provides text or multimodal data that is included in an LLM prompt.
2. **External content sources** — Web pages fetched by browsing tools, documents loaded into RAG pipelines, email bodies, database records, calendar entries, third-party API responses, and any other data source the LLM reads but the user does not directly control at query time.
3. **System prompt construction** — How the system prompt is assembled, whether it is static or dynamically composed, and whether any user-influenced data (e.g., user profile fields, prior conversation history) is interpolated into it.
4. **Tool and plugin interfaces** — Any tools the LLM can invoke (code execution, web search, file system access, API calls), including what parameters are LLM-controlled and what side effects each tool can produce.
Expand Down Expand Up @@ -92,6 +94,7 @@ For each external content source identified in Step 1, determine whether an adve
- **Database records** — If user-generated content stored in a database is later retrieved as LLM context, any user who can write to that database is an injection vector.
- **File uploads and document processing** — PDFs, spreadsheets, and other documents can contain text that, when extracted and sent to the LLM, functions as injected instructions.
- **API responses** — Third-party APIs whose responses are fed into the LLM context could be compromised or manipulated.
- **Cross-Site Prompt Injection (XSPI) / Agent-to-Agent** — In multi-agent architectures, if Agent A is compromised, it can generate malicious text/context that is passed to Agent B, laterally spreading the injection.

**What to look for in code:**
- Document loaders, web scrapers, or API clients whose output is inserted into prompts
Expand Down Expand Up @@ -142,6 +145,15 @@ The attacker causes the model to include sensitive data in its output or to tran
- Can tool calls be used to send data to arbitrary external endpoints?
- Are outputs filtered for sensitive data patterns?

### 4.6 Multimodal Injection

The attacker uses vision or audio inputs to bypass text-based sanitization entirely. Multimodal models can parse embedded instructions directly from pixels or audio spectrograms.

**What to evaluate:**
- Does the application accept images, audio, or video inputs that are sent to a multimodal model?
- Are there defenses specifically designed to strip hidden text (OCR scanning) or adversarial perturbations from images?
- Does the application blindly trust the model's interpretation of an uploaded image without bounding the resulting actions?

### 4.5 Jailbreaking

The attacker bypasses the model's safety guidelines or the application's behavioral constraints. While jailbreaking the base model is partly a model-provider concern, application-level jailbreaking (circumventing application-specific rules) is the application developer's responsibility.
Expand Down Expand Up @@ -204,6 +216,14 @@ Evaluate which of the following mitigations are implemented and how effectively.

---

### 5.8 AI Firewall / LLM Gateway

- Does the application deploy a dedicated LLM Gateway or AI Firewall (e.g., NeMo Guardrails, Lakera Guard, PromptArmor) as a discrete architectural layer?
- Is there evidence that the gateway inspects both inbound user prompts (for injection) and outbound LLM responses (for data loss/policy violation)?
- Does the firewall enforce strict system/user role separation at the API level (e.g., ChatML roles)?

---

## Step 6: Report Findings

Compile findings into a structured report using the classification and output format below.
Expand Down Expand Up @@ -237,7 +257,7 @@ Each finding should be assigned a severity based on potential impact:
### Findings

#### Finding [N]: [Title]
- Category: [Goal Hijacking | Prompt Leaking | Privilege Escalation | Data Exfiltration | Jailbreaking]
- Category: [Goal Hijacking | Prompt Leaking | Privilege Escalation | Data Exfiltration | Jailbreaking | Multimodal Injection | XSPI]
- Vector: [Direct | Indirect]
- Severity: [Critical | High | Medium | Low | Informational]
- Location: [file path and line numbers, or architectural component]
Expand Down Expand Up @@ -267,13 +287,15 @@ Each finding should be assigned a severity based on potential impact:

1. **Testing only direct injection and ignoring indirect injection.** Indirect injection through RAG pipelines, emails, and fetched web content is often a larger attack surface than direct user input. Applications that ingest external content are exposed to any adversary who can influence that content, which is frequently a much broader set of attackers than those with direct application access.

2. **Relying on prompt instructions as a security boundary.** System prompts that say "never reveal these instructions" or "always refuse harmful requests" are not enforceable security controls. They are behavioral suggestions to a probabilistic model. Security-critical constraints must be enforced through code, not through natural language instructions to the LLM.
2. **Ignoring Multimodal Input Surfaces.** Text-based prompt sanitization is useless if an attacker can upload an image containing the text "Ignore previous instructions." Multimodal LLMs read images natively, meaning vision and audio inputs must be treated with the exact same adversarial scrutiny as text fields.

3. **Relying on prompt instructions as a security boundary.** System prompts that say "never reveal these instructions" or "always refuse harmful requests" are not enforceable security controls. They are behavioral suggestions to a probabilistic model. Security-critical constraints must be enforced through code, not through natural language instructions to the LLM.

3. **Assuming input blocklists are sufficient.** Blocklisting known injection phrases (e.g., "ignore previous instructions") is trivially bypassed through paraphrasing, encoding, or language switching. Input validation should focus on allowlisting expected input formats rather than blocklisting known attacks.
4. **Assuming input blocklists are sufficient.** Blocklisting known injection phrases (e.g., "ignore previous instructions") is trivially bypassed through paraphrasing, encoding, or language switching. Input validation should focus on allowlisting expected input formats rather than blocklisting known attacks.

4. **Granting the LLM excessive tool access.** Applications that give the LLM access to powerful tools (file system writes, email sending, database modifications, code execution) without independent authorization checks create high-severity privilege escalation risk. Every tool the LLM can invoke should have its own authorization gate that does not depend on the LLM's judgment.
5. **Granting the LLM excessive tool access.** Applications that give the LLM access to powerful tools (file system writes, email sending, database modifications, code execution) without independent authorization checks create high-severity privilege escalation risk. Every tool the LLM can invoke should have its own authorization gate that does not depend on the LLM's judgment.

5. **Failing to treat retrieved content as untrusted.** RAG pipelines often insert retrieved document chunks directly into the prompt with no distinction from system instructions. The LLM cannot inherently distinguish "this is data to reason about" from "this is an instruction to follow." Retrieved content should be explicitly demarcated and, where possible, processed through a model or layer that enforces instruction hierarchy.
6. **Failing to treat retrieved content as untrusted.** RAG pipelines often insert retrieved document chunks directly into the prompt with no distinction from system instructions. The LLM cannot inherently distinguish "this is data to reason about" from "this is an instruction to follow." Retrieved content should be explicitly demarcated and, where possible, processed through a model or layer that enforces instruction hierarchy.

---

Expand Down
72 changes: 48 additions & 24 deletions skills/incident-response/containment/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,22 +103,24 @@ Short-term containment aims to stop the immediate threat with minimal preparatio

| Strategy | Method | Use When | Limitations |
|----------|--------|----------|-------------|
| **Port shutdown** | Disable switchport or cloud security group ingress/egress | Single host compromise, not business-critical | Disrupts all services on the host |
| **Port shutdown** | Disable switchport or physical interface | Single host compromise, not business-critical | Disrupts all services on the host |
| **VLAN isolation** | Move host to quarantine VLAN with restricted routing | Need to maintain some connectivity for evidence collection | Requires network team coordination |
| **Firewall rule** | Block specific IPs, ports, or protocols at perimeter or host firewall | Known C2 infrastructure, specific attack vector | Attacker may use alternate C2 channels |
| **DNS sinkholing** | Redirect malicious domains to controlled IP via internal DNS | C2 communication via domain names | Ineffective if attacker uses direct IP communication |
| **Cloud security group lockdown** | Remove all inbound/outbound rules except management access | Cloud instance compromise | May disrupt dependent services |
| **VPN/remote access revocation** | Disable VPN accounts, revoke remote access tokens | Compromised remote access credentials | Disrupts legitimate remote users on same system |
| **Firewall / Proxy rule** | Block specific IPs, ports, or protocols at perimeter or host firewall | Known C2 infrastructure, specific attack vector | Attacker may use alternate C2 or proxy bypass channels |
| **DNS sinkholing** | Redirect malicious domains to controlled IP via internal DNS | C2 communication via domain names | Ineffective if attacker uses direct IP, DoH, or DoT |
| **Cloud SG/NSG lockdown** | Detach old groups, attach isolation group, verify route/NACL | Cloud instance or workload compromise | Must verify attachment to effective path |
| **Kubernetes CNI block** | Deploy NetworkPolicy blocking all ingress/egress | Container/Pod compromise | CNI must actually enforce the policy |
| **VPN/remote access block** | Disable VPN accounts, revoke remote access tokens | Compromised remote access | Disrupts legitimate remote users |

**Credential revocation strategies:**

| Strategy | Method | Use When | Scope |
|----------|--------|----------|-------|
| **Password reset** | Force password change for compromised accounts | Credential theft confirmed or suspected | Individual accounts |
| **Session invalidation** | Revoke all active sessions and tokens for affected accounts | Session hijacking, token theft | Individual accounts |
| **Password reset** | Force password change for compromised accounts | Credential theft confirmed or suspected | Identity provider |
| **Session & Token revocation** | Revoke active IdP sessions, refresh tokens, and downstream SaaS sessions | Session hijacking, token theft | Identity provider + SaaS apps |
| **App Grant / OAuth revocation** | Review and revoke malicious OAuth app consents | Illicit consent grants, app abuse | Identity provider |
| **API key rotation** | Generate new API keys, revoke old keys | API key exposure or misuse | Specific services |
| **Certificate revocation** | Revoke and reissue TLS/mTLS certificates | Certificate compromise, CA compromise | Services using the certificate |
| **Service account reset** | Reset service account passwords and regenerate keys | Lateral movement via service accounts | Downstream services may break |
| **Certificate / SSH key revocation** | Revoke and reissue TLS/mTLS certificates or SSH keys | Certificate compromise, key theft | Services or hosts using the credential |
| **Service principal reset** | Reset service principal secrets and regenerate keys | Lateral movement via service accounts | Downstream services may break |
| **Kerberos ticket reset** | Reset krbtgt account password (twice, per Microsoft guidance) | Golden ticket attack, domain compromise | Domain-wide impact; requires careful planning |
| **MFA token reset** | Deregister and re-enroll MFA devices | MFA bypass, SIM swap, device compromise | Individual users |

Expand Down Expand Up @@ -201,20 +203,39 @@ Wiper and destructive malware require a distinct containment approach from ranso

**Key difference from ransomware containment:** Do not attempt to "monitor and observe" a wiper in progress. Every second of observation is data permanently destroyed. Aggressive, immediate containment is always the correct posture for confirmed wiper activity.

### Step 5: Containment Validation
### Step 5: Containment Validation & Effective Enforcement Evidence

After implementing containment, verify effectiveness before proceeding to eradication.
After implementing containment, verify effectiveness before proceeding to eradication. An action being "sent" or a ticket being "resolved" is not sufficient. You must collect **Effective Enforcement Evidence** from the provider or endpoint.

**Validation checklist:**
#### Network Containment Evidence
| Target Asset | Control Plane | Enforcement Point | Attachment/Scope Proof | Protocol Coverage | State | Telemetry Source | Result |
|---|---|---|---|---|---|---|---|
| [Instance/VM ID] | [e.g., AWS EC2] | [e.g., Security Group] | [Verified attached to ENI] | [All Egress/Ingress Deny] | [Active] | [VPC Flow Logs] | [Blocked connections observed] |
| [Pod/Namespace] | [e.g., Kubernetes] | [e.g., NetworkPolicy] | [CNI plugin enforcement verified] | [Ingress/Egress Deny] | [Applied] | [Cilium/Calico logs] | [Dropped packets observed] |

| Check | Method | Expected Result |
|-------|--------|----------------|
| C2 communication blocked | Monitor network traffic for C2 indicators | No outbound connections to known C2 IPs/domains |
| Lateral movement blocked | Monitor authentication logs and network flows between segments | No unauthorized cross-segment authentication |
| Compromised credentials revoked | Attempt authentication with known-compromised credentials | Authentication fails |
| Attacker persistence neutralized | Scan for known persistence mechanisms | No active persistence artifacts |
| Business services operational (if surgical containment) | Verify critical service health checks | Services responding normally |
| Evidence preserved | Verify forensic images and memory dumps are intact and hashed | Hash verification passes |
#### Identity Containment Evidence
| Target Identity | Control Plane | Enforcement Point | Scope Proof | State | Telemetry Source | Result |
|---|---|---|---|---|---|---|
| [User/Service] | [e.g., Entra ID] | [Password & Sessions] | [Password changed, Refresh tokens revoked] | [Revoked] | [Sign-in Logs] | [Active sessions terminated] |
| [OAuth App] | [e.g., Google Workspace] | [App Grants] | [Specific app ID revoked] | [Revoked] | [Audit Logs] | [Token refresh failures] |

#### DNS Containment Evidence
| Target Asset | Control Plane | Enforcement Point | Scope Proof | State | Telemetry Source | Result |
|---|---|---|---|---|---|---|
| [Endpoint/Subnet] | [e.g., Internal DNS] | [e.g., DNS RPZ] | [Endpoint uses this resolver path, no DoH/DoT bypass] | [Active] | [DNS Query Logs] | [Sinkhole IP returned for C2] |

#### EDR Containment Evidence
| Target Asset | Control Plane | Enforcement Point | Scope Proof | State | Telemetry Source | Result |
|---|---|---|---|---|---|---|
| [Hostname] | [e.g., CrowdStrike] | [Agent Isolation] | [Exceptions mapped correctly] | [Acknowledged by endpoint] | [EDR Console] | [Endpoint offline to non-mgmt traffic] |

#### Fallback and Escalation Criteria

If primary containment validation is pending or fails, immediately trigger fallback criteria:
- **EDR Isolation Pending / Endpoint Offline:** Fall back to Network Containment (switchport disable, cloud SG isolation, or VPN disconnect).
- **Cloud Rule Not Attached / Telemetry Unavailable:** Escalate to broader subnet/VPC isolation or physically power down the instance if critical.
- **DNS Sinkhole Bypassed (e.g., DoH in use):** Fall back to Egress Filtering at the firewall dropping all port 443 traffic to unknown destinations.
- **Session Revocation Failed:** Disable the account entirely rather than just resetting the password and clearing tokens.

**Containment failure indicators:**
- New C2 connections from previously unknown infrastructure
Expand Down Expand Up @@ -289,10 +310,13 @@ threat severity and business criticality, and expected impact on operations.]
|---|---|---|---|
| [Service] | [Description of disruption] | [Workaround if any] | [Yes/No -- requires escalation] |

### Containment Validation Checklist
| Check | Result | Timestamp |
|---|---|---|
| [Validation item] | [Pass/Fail/Pending] | [timestamp] |
### Effective Enforcement Evidence
| Containment Action | Target Asset | Enforcement Point | Scope/State Proof | Telemetry Source | Result/Timestamp |
|---|---|---|---|---|---|
| [Action Name] | [Asset ID] | [Control Plane/Tool] | [Proof of Attachment/Revocation] | [Log Source] | [Pass/Fail/Pending - Timestamp] |

### Fallback Actions Activated
- [List any fallback actions triggered due to pending/failed primary containment]

### Rollback Conditions
[Document specific conditions under which containment will be modified or rolled back]
Expand Down