ai_security_lab

ai_security_lab is a small companion project for a book about securing AI applications.

The project is designed to support a simple teaching arc:

start with a vulnerable AI support workflow
show how prompt injection, unsafe tools, and data leakage happen
harden the same workflow with explicit controls
test the controls instead of trusting a prompt alone

The first scaffold stays small and deterministic by default. It starts CLI-first, but it also includes a minimal local web/API surface and an optional provider-backed reply mode for later inspection and live-model examples.

The companion project is intentionally narrower than a full AI governance or lifecycle reference. Its purpose is to show how AI application workflows fail in practice, and how to harden those workflows with visible code, tests, and explicit boundaries.

Project Shape

The lab is organized around a fictional support assistant:

examples/data/ fictional customers and tickets
examples/policies/ retrievable support and security policies
examples/uploads/ untrusted user-supplied content, including malicious prompt-injection cases
examples/evals/ bundled regression cases for workflow, payload, and tool behavior
src/ai_security_lab/workflows.py vulnerable, hardened, and provider-backed assistant flows
src/ai_security_lab/security.py redaction, role checks, and approval checks
src/ai_security_lab/validation.py structured-output validation for AI-generated reply payloads
src/ai_security_lab/provider.py optional OpenAI-compatible provider client for live-model reply mode
src/ai_security_lab/tools.py small tool contracts for sensitive actions
src/ai_security_lab/audit.py append-only local audit logging
src/ai_security_lab/evals.py small behavior-eval runner for workflow, payload, and tool cases
src/ai_security_lab/web.py small local API for reply, validation, export, and audit scenarios
tests/ regression checks for the controls

Teaching Scope

The scaffold is meant to support chapters such as:

prompt injection
RAG poisoning and malicious documents
sensitive data leakage
unsafe tool calls
output validation
approval gates
audit logging
structured-output validation

Book Snippet Map

If you are turning the lab into book material, the most useful excerpts are small before-and-after pairs. A few of the vulnerable snippets below are taken directly from the lab, and a few are intentionally reduced "naive" variants so the security boundary fits on one page.

1. Prompt Injection Through Uploaded Notes

Vulnerable:

uploaded_note = load_uploaded_note(paths, ticket.uploaded_note_path)
retrieved = retrieve_policies(f"{ticket.subject} {ticket.body} {uploaded_note}", policy_docs)
if looks_like_prompt_injection(uploaded_note):
    reply += " Uploaded note requested hidden data, so the workflow revealed internal notes."

Hardened:

uploaded_note = load_uploaded_note(paths, ticket.uploaded_note_path)
retrieved = retrieve_policies(
    ticket.body,
    policy_docs,
    allowed_names=allowed_policy_names_for_role(role),
)
if looks_like_prompt_injection(uploaded_note):
    notes.append("Uploaded note triggered prompt-injection handling.")

Use this pair to show that uploaded content must be treated as untrusted data, not merged into the same instruction path as trusted workflow context.

2. Sensitive Data Leakage Versus Redaction

Vulnerable:

reply = (
    f"Draft reply for {customer.full_name}: acknowledge the issue and reassure the customer. "
    f"Internal notes: {customer.internal_notes}"
)

Hardened:

safe_customer = redact_customer(customer)
reply = (
    f"Draft reply for {safe_customer['full_name']}: acknowledge the issue, restate the public next "
    "step, and avoid disclosing internal-only details."
)

This is a compact way to teach that "do not leak secrets" is usually a data preparation problem before it becomes a prompt problem.

3. Unvalidated Output Versus Structured Validation

Vulnerable:

return {
    "customer_reply": "We will review the outage and share the next update shortly.",
    "allowed_action": "export_customer_report",
    "citations": ["security-policy.md"],
    "internal_notes": customer.internal_notes,
}

Hardened:

candidate = {
    "customer_reply": result.reply,
    "allowed_action": "reply_only",
    "citations": sorted(allowed_citations),
}
return validate_support_reply_payload(candidate, allowed_citations=allowed_citations)

Use this pair to show that plausible JSON is still untrusted output until it passes schema, action, and citation checks.

4. Naive Sensitive Tool Use Versus Approval Gates

Naive chapter version:

def export_customer_report(paths, customer_id):
    customer = get_customer(paths, customer_id)
    return {"customer": customer}

Hardened:

if not can_export_customer_report(role):
    raise PermissionError(f"Role {role} may not export customer reports.")
if requires_approval_for_export(role) and not approved:
    raise PermissionError("Manager export requires explicit approval.")
return {
    "customer": redact_customer(customer),
    "notes": ["Returned redacted customer view for export flow."],
}

This chapter pair works well for showing that tool safety usually needs both authorization and an explicit approval boundary for higher-risk actions.

5. Naive Provider Output Versus Post-Generation Safety Checks

Naive chapter version:

reply = generate_chat_reply_openai_compatible(...)
return WorkflowResult(reply=reply, evidence=evidence, notes=notes, leaked_internal_notes=False)

Hardened:

reply = generate_chat_reply_openai_compatible(...)
safety_issues = provider_reply_safety_issues(reply, customer=customer)
if safety_issues:
    raise RuntimeError(
        "Provider reply failed post-generation safety checks: "
        + "; ".join(safety_issues)
    )

Use this pair to show that a live model path still needs deterministic checks after generation, especially when the model saw untrusted text.

6. No Audit Trail Versus Visible Security Events

Naive chapter version:

return WorkflowResult(reply=reply, evidence=evidence, notes=notes, leaked_internal_notes=False)

Hardened:

append_audit_event(
    paths,
    AuditEvent(
        event_type="support_reply",
        actor_role=role,
        resource_id=ticket.ticket_id,
        outcome="allowed",
        details={"prompt_injection_detected": True, "evidence": evidence},
    ),
)

This pair helps readers see that hardening is not only about blocking bad behavior. It is also about leaving a reviewable trail when risky paths are used or blocked.

Quick Start

python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'
pytest
python -m ai_security_lab.cli vulnerable-reply --ticket-id T-1001
python -m ai_security_lab.cli hardened-reply --ticket-id T-1001 --role frontline
python -m ai_security_lab.cli provider-reply --ticket-id T-1001 --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2
python -m ai_security_lab.web

Initial Commands

Vulnerable flow:

python -m ai_security_lab.cli vulnerable-reply --ticket-id T-1001

Hardened flow:

python -m ai_security_lab.cli hardened-reply --ticket-id T-1001 --role frontline

Provider-backed hardened flow:

python -m ai_security_lab.cli provider-reply \
  --ticket-id T-1001 \
  --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2

The provider-backed path reuses the hardened boundaries. It still redacts customer data, filters retrieval by role, records an audit event, and treats the uploaded note as untrusted text. It also runs a small post-generation safety check and rejects replies that appear to disclose internal-only content. The endpoint and model can also be set through:

AI_SECURITY_LAB_LLM_ENDPOINT
AI_SECURITY_LAB_LLM_MODEL
AI_SECURITY_LAB_API_KEY

Testing With A Live Model

The provider-backed path expects an OpenAI-compatible chat completions endpoint. You can pass settings on the command line or through environment variables.

Set environment variables:

export AI_SECURITY_LAB_LLM_ENDPOINT=http://127.0.0.1:11434/v1/chat/completions
export AI_SECURITY_LAB_LLM_MODEL=llama3.2
# export AI_SECURITY_LAB_API_KEY=...   # only if your provider requires one

Run the provider-backed CLI flow:

python -m ai_security_lab.cli provider-reply --ticket-id T-1001 --role frontline

Or pass the provider settings directly:

python -m ai_security_lab.cli provider-reply \
  --ticket-id T-1001 \
  --role frontline \
  --llm-endpoint http://127.0.0.1:11434/v1/chat/completions \
  --llm-model llama3.2

Expected behavior:

the reply still uses role-filtered retrieval
customer data is redacted before the model-visible prompt
internal-only content should not appear in the final reply
the workflow returns an error if post-generation safety checks fail

Validate a structured reply payload:

python -m ai_security_lab.cli validate-payload \
  --mode hardened \
  --ticket-id T-1001 \
  --role frontline

Run the bundled security eval set:

python -m ai_security_lab.cli run-evals

Sensitive tool example:

python -m ai_security_lab.cli export-report \
  --customer-id C-1001 \
  --role manager \
  --approved

Run the local web/API surface:

python -m ai_security_lab.web

Then open:

http://127.0.0.1:8010

The local API exposes:

GET /api/health
GET /api/audit
POST /api/reply/vulnerable
POST /api/reply/hardened
POST /api/reply/provider
POST /api/validate/reply
POST /api/export-report

Example hardened reply request:

curl -X POST http://127.0.0.1:8010/api/reply/hardened \
  -H "content-type: application/json" \
  -d '{
    "ticket_id": "T-1001",
    "role": "frontline"
  }'

For the frontline role, that path keeps manager-exceptions.md out of the returned evidence.

Example provider-backed reply request:

curl -X POST http://127.0.0.1:8010/api/reply/provider \
  -H "content-type: application/json" \
  -d '{
    "ticket_id": "T-1001",
    "role": "frontline",
    "llm_endpoint": "http://127.0.0.1:11434/v1/chat/completions",
    "llm_model": "llama3.2"
  }'

Why This Shape Fits The Book

The companion app is intentionally small, but it still has enough moving parts to surface real security boundaries:

untrusted uploaded content
retrievable policy content
retrieval filters based on role
role-based tool access
redaction before model-visible output
structured-output validation before downstream use
explicit approvals for risky actions
audit logs for review

That is enough to show how an AI application can fail, and how the same app can be tightened step by step.

Notes On Current Behavior

Hardened structured replies are validated against retrieved evidence, so citation names have to match the policy files the workflow actually used.
Structured-output validation also rejects calm, plausible reply payloads when they ask for an unsafe downstream action.
The bundled eval set in examples/evals/security_cases.jsonl gives a small regression pass across workflow, payload, and tool scenarios.
The deterministic flows remain the default teaching baseline.
The optional provider-reply path turns the same hardened workflow into a live AI path through an OpenAI-compatible chat endpoint.
Uploaded-note paths are constrained to the local examples/uploads/ directory.
Provider request failures surface as errors from the CLI and as 502 responses from the local API.
Malformed API request fields return structured 400 responses instead of bubbling up as uncaught errors.

License

MIT

Books by the Authors

Scan to check out our books on Amazon

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
assets		assets
examples		examples
src/ai_security_lab		src/ai_security_lab
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai_security_lab

Project Shape

Teaching Scope

Book Snippet Map

1. Prompt Injection Through Uploaded Notes

2. Sensitive Data Leakage Versus Redaction

3. Unvalidated Output Versus Structured Validation

4. Naive Sensitive Tool Use Versus Approval Gates

5. Naive Provider Output Versus Post-Generation Safety Checks

6. No Audit Trail Versus Visible Security Events

Quick Start

Initial Commands

Testing With A Live Model

Why This Shape Fits The Book

Notes On Current Behavior

License

Books by the Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai_security_lab

Project Shape

Teaching Scope

Book Snippet Map

1. Prompt Injection Through Uploaded Notes

2. Sensitive Data Leakage Versus Redaction

3. Unvalidated Output Versus Structured Validation

4. Naive Sensitive Tool Use Versus Approval Gates

5. Naive Provider Output Versus Post-Generation Safety Checks

6. No Audit Trail Versus Visible Security Events

Quick Start

Initial Commands

Testing With A Live Model

Why This Shape Fits The Book

Notes On Current Behavior

License

Books by the Authors

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages