Skip to content

nextframedev/ai_security_lab

Repository files navigation

ai_security_lab

ai_security_lab is a small companion project for a book about securing AI applications.

The project is designed to support a simple teaching arc:

  1. start with a vulnerable AI support workflow
  2. show how prompt injection, unsafe tools, and data leakage happen
  3. harden the same workflow with explicit controls
  4. test the controls instead of trusting a prompt alone

The first scaffold stays small and deterministic by default. It starts CLI-first, but it also includes a minimal local web/API surface and an optional provider-backed reply mode for later inspection and live-model examples.

The companion project is intentionally narrower than a full AI governance or lifecycle reference. Its purpose is to show how AI application workflows fail in practice, and how to harden those workflows with visible code, tests, and explicit boundaries.

Project Shape

The lab is organized around a fictional support assistant:

  • examples/data/ fictional customers and tickets
  • examples/policies/ retrievable support and security policies
  • examples/uploads/ untrusted user-supplied content, including malicious prompt-injection cases
  • examples/evals/ bundled regression cases for workflow, payload, and tool behavior
  • src/ai_security_lab/workflows.py vulnerable, hardened, and provider-backed assistant flows
  • src/ai_security_lab/security.py redaction, role checks, and approval checks
  • src/ai_security_lab/validation.py structured-output validation for AI-generated reply payloads
  • src/ai_security_lab/provider.py optional OpenAI-compatible provider client for live-model reply mode
  • src/ai_security_lab/tools.py small tool contracts for sensitive actions
  • src/ai_security_lab/audit.py append-only local audit logging
  • src/ai_security_lab/evals.py small behavior-eval runner for workflow, payload, and tool cases
  • src/ai_security_lab/web.py small local API for reply, validation, export, and audit scenarios
  • tests/ regression checks for the controls

Teaching Scope

The scaffold is meant to support chapters such as:

  1. prompt injection
  2. RAG poisoning and malicious documents
  3. sensitive data leakage
  4. unsafe tool calls
  5. output validation
  6. approval gates
  7. audit logging
  8. structured-output validation

Book Snippet Map

If you are turning the lab into book material, the most useful excerpts are small before-and-after pairs. A few of the vulnerable snippets below are taken directly from the lab, and a few are intentionally reduced "naive" variants so the security boundary fits on one page.

1. Prompt Injection Through Uploaded Notes

Vulnerable:

uploaded_note = load_uploaded_note(paths, ticket.uploaded_note_path)
retrieved = retrieve_policies(f"{ticket.subject} {ticket.body} {uploaded_note}", policy_docs)
if looks_like_prompt_injection(uploaded_note):
    reply += " Uploaded note requested hidden data, so the workflow revealed internal notes."

Hardened:

uploaded_note = load_uploaded_note(paths, ticket.uploaded_note_path)
retrieved = retrieve_policies(
    ticket.body,
    policy_docs,
    allowed_names=allowed_policy_names_for_role(role),
)
if looks_like_prompt_injection(uploaded_note):
    notes.append("Uploaded note triggered prompt-injection handling.")

Use this pair to show that uploaded content must be treated as untrusted data, not merged into the same instruction path as trusted workflow context.

2. Sensitive Data Leakage Versus Redaction

Vulnerable:

reply = (
    f"Draft reply for {customer.full_name}: acknowledge the issue and reassure the customer. "
    f"Internal notes: {customer.internal_notes}"
)

Hardened:

safe_customer = redact_customer(customer)
reply = (
    f"Draft reply for {safe_customer['full_name']}: acknowledge the issue, restate the public next "
    "step, and avoid disclosing internal-only details."
)

This is a compact way to teach that "do not leak secrets" is usually a data preparation problem before it becomes a prompt problem.

3. Unvalidated Output Versus Structured Validation

Vulnerable:

return {
    "customer_reply": "We will review the outage and share the next update shortly.",
    "allowed_action": "export_customer_report",
    "citations": ["security-policy.md"],
    "internal_notes": customer.internal_notes,
}

Hardened:

candidate = {
    "customer_reply": result.reply,
    "allowed_action": "reply_only",
    "citations": sorted(allowed_citations),
}
return validate_support_reply_payload(candidate, allowed_citations=allowed_citations)

Use this pair to show that plausible JSON is still untrusted output until it passes schema, action, and citation checks.

4. Naive Sensitive Tool Use Versus Approval Gates

Naive chapter version:

def export_customer_report(paths, customer_id):
    customer = get_customer(paths, customer_id)
    return {"customer": customer}

Hardened:

if not can_export_customer_report(role):
    raise PermissionError(f"Role {role} may not export customer reports.")
if requires_approval_for_export(role) and not approved:
    raise PermissionError("Manager export requires explicit approval.")
return {
    "customer": redact_customer(customer),
    "notes": ["Returned redacted customer view for export flow."],
}

This chapter pair works well for showing that tool safety usually needs both authorization and an explicit approval boundary for higher-risk actions.

5. Naive Provider Output Versus Post-Generation Safety Checks

Naive chapter version:

reply = generate_chat_reply_openai_compatible(...)
return WorkflowResult(reply=reply, evidence=evidence, notes=notes, leaked_internal_notes=False)

Hardened:

reply = generate_chat_reply_openai_compatible(...)
safety_issues = provider_reply_safety_issues(reply, customer=customer)
if safety_issues:
    raise RuntimeError(
        "Provider reply failed post-generation safety checks: "
        + "; ".join(safety_issues)
    )

Use this pair to show that a live model path still needs deterministic checks after generation, especially when the model saw untrusted text.

6. No Audit Trail Versus Visible Security Events

Naive chapter version:

return WorkflowResult(reply=reply, evidence=evidence, notes=notes, leaked_internal_notes=False)

Hardened:

append_audit_event(
    paths,
    AuditEvent(
        event_type="support_reply",
        actor_role=role,
        resource_id=ticket.ticket_id,
        outcome="allowed",
        details={"prompt_injection_detected": True, "evidence": evidence},
    ),
)

This pair helps readers see that hardening is not only about blocking bad behavior. It is also about leaving a reviewable trail when risky paths are used or blocked.

Quick Start

python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pytest
python -m ai_security_lab.cli vulnerable-reply --ticket-id T-1001
python -m ai_security_lab.cli hardened-reply --ticket-id T-1001 --role frontline
python -m ai_security_lab.cli provider-reply --ticket-id T-1001 --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2
python -m ai_security_lab.web

Initial Commands

Vulnerable flow:

python -m ai_security_lab.cli vulnerable-reply --ticket-id T-1001

Hardened flow:

python -m ai_security_lab.cli hardened-reply --ticket-id T-1001 --role frontline

Provider-backed hardened flow:

python -m ai_security_lab.cli provider-reply \
  --ticket-id T-1001 \
  --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2

The provider-backed path reuses the hardened boundaries. It still redacts customer data, filters retrieval by role, records an audit event, and treats the uploaded note as untrusted text. It also runs a small post-generation safety check and rejects replies that appear to disclose internal-only content. The endpoint and model can also be set through:

  • AI_SECURITY_LAB_LLM_ENDPOINT
  • AI_SECURITY_LAB_LLM_MODEL
  • AI_SECURITY_LAB_API_KEY

Testing With A Live Model

The provider-backed path expects an OpenAI-compatible chat completions endpoint. You can pass settings on the command line or through environment variables.

Set environment variables:

export AI_SECURITY_LAB_LLM_ENDPOINT=http://127.0.0.1:11434/v1/chat/completions
export AI_SECURITY_LAB_LLM_MODEL=llama3.2
# export AI_SECURITY_LAB_API_KEY=...   # only if your provider requires one

Run the provider-backed CLI flow:

python -m ai_security_lab.cli provider-reply --ticket-id T-1001 --role frontline

Or pass the provider settings directly:

python -m ai_security_lab.cli provider-reply \
  --ticket-id T-1001 \
  --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2

Expected behavior:

  • the reply still uses role-filtered retrieval
  • customer data is redacted before the model-visible prompt
  • internal-only content should not appear in the final reply
  • the workflow returns an error if post-generation safety checks fail

Validate a structured reply payload:

python -m ai_security_lab.cli validate-payload \
  --mode hardened \
  --ticket-id T-1001 \
  --role frontline

Run the bundled security eval set:

python -m ai_security_lab.cli run-evals

Sensitive tool example:

python -m ai_security_lab.cli export-report \
  --customer-id C-1001 \
  --role manager \
  --approved

Run the local web/API surface:

python -m ai_security_lab.web

Then open:

http://127.0.0.1:8010

The local API exposes:

  • GET /api/health
  • GET /api/audit
  • POST /api/reply/vulnerable
  • POST /api/reply/hardened
  • POST /api/reply/provider
  • POST /api/validate/reply
  • POST /api/export-report

Example hardened reply request:

curl -X POST http://127.0.0.1:8010/api/reply/hardened \
  -H "content-type: application/json" \
  -d '{
    "ticket_id": "T-1001",
    "role": "frontline"
  }'

For the frontline role, that path keeps manager-exceptions.md out of the returned evidence.

Example provider-backed reply request:

curl -X POST http://127.0.0.1:8010/api/reply/provider \
  -H "content-type: application/json" \
  -d '{
    "ticket_id": "T-1001",
    "role": "frontline",
    "llm_endpoint": "http://127.0.0.1:11434/v1/chat/completions",
    "llm_model": "llama3.2"
  }'

Why This Shape Fits The Book

The companion app is intentionally small, but it still has enough moving parts to surface real security boundaries:

  • untrusted uploaded content
  • retrievable policy content
  • retrieval filters based on role
  • role-based tool access
  • redaction before model-visible output
  • structured-output validation before downstream use
  • explicit approvals for risky actions
  • audit logs for review

That is enough to show how an AI application can fail, and how the same app can be tightened step by step.

Notes On Current Behavior

  • Hardened structured replies are validated against retrieved evidence, so citation names have to match the policy files the workflow actually used.
  • Structured-output validation also rejects calm, plausible reply payloads when they ask for an unsafe downstream action.
  • The bundled eval set in examples/evals/security_cases.jsonl gives a small regression pass across workflow, payload, and tool scenarios.
  • The deterministic flows remain the default teaching baseline.
  • The optional provider-reply path turns the same hardened workflow into a live AI path through an OpenAI-compatible chat endpoint.
  • Uploaded-note paths are constrained to the local examples/uploads/ directory.
  • Provider request failures surface as errors from the CLI and as 502 responses from the local API.
  • Malformed API request fields return structured 400 responses instead of bubbling up as uncaught errors.

License

MIT

Books by the Authors

QR code to our books on Amazon
Scan to check out our books on Amazon

About

A hands-on lab demonstrating how AI application workflows fail — and how to harden them with explicit controls, tests, and boundaries.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages