AI-driven bug bounty automation for HackerOne — research preview
Huntress is a Tauri 2.0 desktop application that automates HackerOne bug bounty hunting through a coordinated fleet of AI agents. A primary orchestrator (Claude Opus 4.6) ingests a program's scope, plans an attack strategy, and dispatches 27 specialized vulnerability-hunting agents — each running its own ReAct loop on a cost-tiered model (Haiku or Sonnet). All target interactions pass through a Rust scope validator, a default-deny security model, and a human approval gate.
This is a research preview, not a production tool. The platform infrastructure is solid and well-tested, but no Huntress-generated report has been triaged on a real HackerOne program yet — that is the next milestone, not a past achievement. Use accordingly.
| Indicator | Value | Source |
|---|---|---|
| XBOW benchmark | 38 / 103 challenges solved (36.89%) | v18 second published run, 2026-05-09 (commit 6e8450e) — 52% L1, 27% L2, 13% L3; $58.05; 5h33m; Sonnet 4.6 |
| HackerOne triaged submissions | 0 | First submission is an open milestone |
| TypeScript tests | 3,857 passing / 18 skipped / 0 failing | npx vitest run (169 files) |
| Rust tests | 159 passing / 0 failing / 1 ignored | cd src-tauri && cargo test |
| Type / lint | tsc --noEmit clean · cargo clippy --all-targets -- -D warnings clean |
CI |
| Open audit findings | 0 | All 9 BLOCKERs, 7 HIGH-severity, and 8 MED-severity items closed across v15–v17 |
| Hunt history | 12 sessions (6 OWASP Juice Shop training + 6 real-world H1 programs) | Internal logs |
The XBOW number above is the v18 second published run — the first full 103-challenge completion (v13.3 aborted at 55/103 by credit exhaustion). L2 jumped from 2% → 27% and L3 from 0% → 13% versus v13.3. XSS remains the largest capability gap at 4% solve rate; see PIPELINE.md §13 for the full per-category breakdown.
- Wave 7 (2026-05-18 — 2026-05-19) — Anthropic CVP integration: official Cyber Verification Program approval received 2026-05-13 for org
64d13134-7083-4aca-a808-fb16a2d6b9d0(dual-use cybersecurity activities, with mass-exfil + ransomware as enforced carve-outs). 9 commits shipped: A7 CVP-aware authorization preamble + A1a centralized payload library (14 carve-out safety gates) + A1b orchestrator wirescvpStatusend-to-end + A1c-A1e per-agent payload-library migrations across 8 hunter agents (xss / sqli / ssrf / ssti / cmd_injection / proto_pollution / deserialization / path_traversal) + B1+B2 SSO realm 3-state oracle + cross-realm token validators + B3+B4+B7+B8+B9 deterministic validator bundle (uuidv1_leak / cors_method_misadvertised / auth_bypass_headers / http_verb_confusion / admin_debug_paths — engagement-derived from Quixel SSO + KWS findings) + C5 adaptive agent dispatch driven by Phase 0.5 disclosed-report dedup analysis (avoid-list agents dropped, underserved-list agents priority-boosted). SetHUNTRESS_CVP_GRANTED=1+HUNTRESS_CVP_ORG_ID=<approved-org-id>in env to activate aggressive payloads across all 8 migrated agents. Default behavior unchanged when env unset (back-compat). - Wave 6 (2026-05-17 — 2026-05-18) — engagement-derived items (10 commits): program_selector saturation/asset-type scoring; Phase 0.5 disclosed-report dedup analysis library; orchestrator EV checkpoint wiring; Salesforce Aura RPC validator; auth_param_validation validator; SPA bundle endpoint extraction; acquisition / shared-infra detector; Subzy false-positive filter; WAF bypass budget cap (20 min per target); mobile APK endpoint extractor (apktool + strings + scoped grep + redacted-credentials).
- v17 (May 2026) — closed every remaining MED audit finding: IPv4-mapped-IPv6 CIDR bypass, IDN homoglyph regression guards, a real 32-bit aliasing bug in the duplicate-checker SimHash, severity-predictor calibration staleness warnings, HTTPS-aware readiness probe (
http://andhttps://probed in parallel),condition: service_healthyregex hardening with lookahead anchor, and 13 new Rust tests for sandbox + agent_browser invariants. - v16 (May 2026) — closed all remaining HIGH audit findings: 9 deterministic validators for previously pass-through vulnerability types (CSRF, OAuth redirect_uri, info disclosure, rate-limit bypass, CRLF, blind boolean SQLi, MFA bypass, SAML, WebSocket); 20 new HackerOne-shape report templates; chain-summarizer evidence preservation (FLAG, secrets, endpoints survive context pruning); open-redirect validator hardening (javascript:-scheme + protocol-relative
//hostdetection). - v15 (May 2026) — closed all 9 BLOCKERs from a comprehensive program audit: dynamic Docker compose service discovery, expanded body-snippet capture (15 KB), production submission pathway wired to the report-template system, broken-access validator now requires differential proof.
A single source of truth for all open and closed work lives in PIPELINE.md. A research-derived backlog distilled from the team's accumulated bug-bounty methodology corpus is summarized at PIPELINE.md §3.6 (P1-6) and tracked in detail in the project's Obsidian vault notes (developer-local).
- Overview
- Benchmark Results
- Architecture
- Capabilities
- Agent Fleet
- Security Model
- Technology Stack
- Installation
- Configuration
- Usage
- Development
- Testing
- Project Structure
- Known Gaps
- Disclaimer
- License
- Acknowledgments
Huntress operates on a coordinator-solver architecture:
- The OrchestratorEngine (Claude Opus 4.6 by default) ingests a HackerOne bounty program's scope, rules, asset list, and bounty table; produces a ranked attack plan; and delegates work to specialist agents via a fire-and-forget dispatch pattern (5 agents in flight, the rest queued).
- Specialist agents run a ReAct loop (Reason + Act) on a tier-appropriate model — Haiku for cheap, deterministic checks (recon, CORS headers, cache, subdomain takeover); Sonnet for harder reasoning (SQLi, XSS, SSRF, IDOR, OAuth, JWT). Tier assignments are locked for security-sensitive agents and cannot be raised by keyword inflation.
- A Rust scope validator (
safe_to_test.rs, 1,468 LOC) is the single chokepoint for "is this target legal to touch" decisions. Default-deny. Wildcard, CIDR, IPv6, IP-range, and port-list support. 49 dedicated tests. - A human approval gate sits in front of every state-changing command, with per-category auto-approve rules available for users who want to run without per-command intervention.
- A kill switch (
kill_switch.rs) provides single-keystroke shutdown, persists across crashes, fail-safes to ACTIVE on corrupted state, and tears down all sandbox containers on activation.
The platform supports any combination of providers — Anthropic, OpenAI, Google, OpenRouter, and local Ollama — through a common ModelProvider interface. The default and recommended configuration uses Anthropic Claude exclusively.
| Principle | Implementation |
|---|---|
| User control at every step | Approval gate sits in front of every active-test command; rejection unwinds the agent's plan. |
| Default-deny security posture | Out-of-scope targets, unvalidated commands, and unhealthy proxies are blocked rather than passed through. |
| Multi-model by design | Orchestrator and per-agent models swap with zero code changes; provider failure triggers fallback. |
| Cost-aware | Tiered routing keeps simple agents on Haiku; budget enforcement at 90% (warn) and 100% (hard stop). |
| Desktop-native | Ships as a single double-click app on Linux, macOS, and Windows via Tauri 2.0 — not a CLI. |
The XBOW Validation Benchmark is a public set of 104 dockerized vulnerability challenges with known flags. It is the closest thing the AI-bug-bounty space has to a standardized eval.
| Run | Score | Date | Commit | Notes |
|---|---|---|---|---|
| v18 | 38 / 103 (36.89%) | 2026-05-09 | 6e8450e |
Second published run, first full 103-challenge completion (5h33m, $58.05, Sonnet 4.6). 52% L1 / 27% L2 / 13% L3. |
| v13.3 | 22 / 103 (21.4%) | 2026-05-07 | f479f56 |
First published full-corpus run; aborted at 55/103 by credit exhaustion. |
What the 21.4% means: for each of 103 challenges, a Sonnet-tier specialist agent (selected by attack-tag heuristics) was given 40 ReAct iterations and ~10 minutes of wall-clock time to extract a FLAG{...} token from a freshly-built Docker container. Solved = the agent's reported flag matched the build-arg-injected UUID, OR (when our build-arg injection couldn't be verified) any well-formed FLAG{...} the agent extracted from the running container.
What it does not measure: real-world HackerOne triage outcomes. XBOW challenges are CTFs with known winning paths; HackerOne programs are open-ended targets where finding a valid bug is half the work and writing a triagable report is the other half. A high XBOW score is necessary but not sufficient for first-submission success.
The full per-challenge breakdown, agent dispatch traces, and SQLite-persisted run history are stored under ~/.local/share/huntress/benchmarks/xbow/ after a run completes.
# Clone the XBOW benchmark suite (one-time, ~2 GB)
git clone https://github.com/xbow-engineering/validation-benchmarks ~/xbow-benchmark
# Run the full 104-challenge sweep (cost: ~$70-100 in Anthropic API spend, ~6h wall clock)
npm run benchmark:xbow -- --benchmark-dir ~/xbow-benchmark --model claude-sonnet-4-6Per-challenge timeout, parallelism, and iteration cap are configurable via BenchmarkConfig. Each attempt streams JSONL traces to ~/.local/share/huntress/benchmarks/xbow/<run-id>/<challenge>.jsonl for post-hoc inspection.
+-----------------------------------------------------------------------+
| Huntress Desktop Application |
+-----------------------------------------------------------------------+
| |
| +---------------------------------------------------------------+ |
| | Frontend (React 19 / TypeScript) | |
| | Chat | Agent Status | Findings | Reports | Settings | PTY | |
| +-------------------------------+-------------------------------+ |
| | |
| Tauri IPC Bridge |
| | |
| +-------------------------------v-------------------------------+ |
| | Backend (Rust / Tauri 2.0) | |
| | Scope Validator | PTY Manager | Kill Switch | Sandbox Mgr | |
| | Secure Storage | Proxy Pool | H1 API | Agent Browser | |
| +-------------------------------+-------------------------------+ |
| | |
| +-------------------------------v-------------------------------+ |
| | AI Orchestration Layer (TypeScript) | |
| | OrchestratorEngine | AgentRouter | ReAct Loop | Cost Router | |
| | Adviser | Refiner | Toolcall_fixer | Chain Summarizer | |
| +-------------------------------+-------------------------------+ |
| | |
| +-------------------------------v-------------------------------+ |
| | Specialist Agent Fleet (27) | |
| | XSS · SQLi · SSRF · IDOR · OAuth · JWT · SSTI · XXE · ... | |
| +-------------------------------+-------------------------------+ |
| | |
| +-------------------------------v-------------------------------+ |
| | Data Layer | |
| | Qdrant (vectors) | SQLite (knowledge graph + benchmark) | |
| | OS Keychain (secrets) | Asciinema (audit trail) | |
| +---------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
- The orchestrator builds an ordered task list from the program's scope and attack-surface analysis.
dispatchAgent()fires up to 5 agents in parallel and returns immediately — finding deduplication, validation, and H1 duplicate-check all run fire-and-forget.- Each agent's ReAct loop iterates: think → call a tool → observe → update plan, with a hard cap (40 iterations by default) and a chain summarizer that pages out old context while preserving high-value evidence (flags, secrets, discovered endpoints).
- Findings flow through 27 deterministic validators (active probes, not LLM grading) before the user ever sees them. Confidence scores are adjusted based on real server evidence — there is no auto-confirm-on-agent-claim shortcut.
ModelProvider (interface)
├── AnthropicProvider (Claude Opus 4.x, Sonnet 4.x, Haiku 4.x)
├── OpenAIProvider (GPT-4o, GPT-4o-mini, o3)
├── GoogleProvider (Gemini 2.5 Pro, Gemini Flash)
├── OpenRouterProvider (any model exposed via OpenRouter)
└── LocalProvider (Ollama — Llama, Mistral, Qwen, etc.)
The default configuration uses Anthropic Claude exclusively because the team only validates against Anthropic models. Switching providers requires only a Settings change; no code edit.
- 27 specialist agents (+ 2 utility agents: Auth Worker and Recon) covering OWASP Top 10 plus emerging classes (prototype pollution, prompt injection, SAML signature wrapping, HTTP smuggling, race conditions).
- 66 validator registrations across 62 unique vulnerability types in
validator.ts— most are deterministic active probes (XSS uses Playwright dialog detection, SQLi re-executes payloads with timing diff, SSRF + XXE + command injection use OOB callbacks viainteractsh, CSRF validates by replaying with attackerOrigin, 4 OAuth deterministic types, cache poisoning 3-step proof, JWT none + alg-confusion forgery, 20-concurrent race condition with distribution analysis, Wave 7 added SSO realm 3-state oracle + cross-realm token confused-deputy + UUIDv1 leak + CORS method-set misadvertisement + auth-bypass-headers + HTTP-verb-confusion + admin-debug-paths). A small set is intentionally pass-through where deterministic verification is provably impossible (subdomain_takeoverheuristic-only,csrfand the stateful blind-boolean family require custom scaffolding). - Active probing only: every confirmed finding has hard server evidence (a triggered alert, a leaked secret, a successful injection with a 2xx); the agent's confidence alone never produces
confirmed=true. - Cross-agent knowledge sharing via a Blackboard pattern — the IDOR hunter learns about the JWT structure the auth-worker discovered without re-fetching it.
- WAF awareness — agents receive vendor-specific bypass strategies via injected
WafContext.
- HackerOne-shape report templates for 30 vulnerability types covering Summary, Vulnerability Details, Prerequisites, Steps to Reproduce, HTTP Evidence, Expected vs Actual, Proof of Concept, Impact, Affected Scope, and Remediation.
- Real CVSS 3.1 calculator wired into report generation; vector strings included.
- Duplicate detection against H1 hacktivity (verified live), GitHub Security Advisories, and the local Qdrant memory; uses 64-bit FNV-1a SimHash with proper bigint shifts (after a v17 fix to a previous 32-bit-aliasing bug).
- Severity prediction with calibration-staleness warnings — the predictor flags when its 2025 industry-average baseline has aged past 365 days and historical data is sparse.
- HackerOne API integration with attachment upload, draft preview, and a
ReportReviewModalsubmission gate that blocks duplicate-skip, F-grade quality scores, missing descriptions, or insufficient reproduction steps.
- JS-rendered crawler (opt-in via
useHeadlessBrowser: trueonCrawlConfig) that drives a real browser via Playwright; discovers SPA endpoints HTTP-only crawlers miss (Angular / React / Vue). Default is HTTP-only for back-compat; flip the flag when targeting JS-heavy applications. - OpenAPI / Swagger / GraphQL schema parsing → endpoint catalog → task generation.
- Subdomain enumeration, technology fingerprinting, parameter mining, and Nuclei template integration for known-vulnerability scanning.
- Kill switch — atomic, persistent, fail-safe to ACTIVE; tears down sandbox containers on activation.
- Approval gate with 60-second timeout, audit trail logging, and per-category auto-approve rules.
- Adaptive rate controller per-domain, with WAF detection.
- Stealth headers — 19 user-agent profiles, randomized.
- Proxy rotation with health checking; failed proxies evicted automatically; poisoned-mutex recovery in the global pool.
Each agent self-registers via registerAgent() at module import time. Registration is centralized in src/agents/standardized_agents.ts.
| Agent | Vulnerability class | Tier |
|---|---|---|
| Recon Agent | Subdomain enumeration, tech fingerprinting, endpoint discovery | Haiku |
| Auth Worker Agent | Login flow detection, session acquisition, token refresh | Sonnet |
| OAuth (4 sub-modules) | redirect_uri override, missing state, PKCE downgrade, scope escalation | Sonnet |
| SSRF Hunter | Server-side request forgery, internal service access | Sonnet |
| XSS Hunter | Reflected, stored, and DOM-based cross-site scripting | Sonnet |
| SQLi Hunter | Error-based, blind boolean, blind time-based across multiple engines | Sonnet |
| NoSQL Hunter | NoSQL injection (MongoDB, CouchDB, etc.) | Sonnet |
| GraphQL Hunter | Introspection, batching, nested query depth | Haiku |
| IDOR Hunter | Insecure direct object references, BOLA, two-account access proof | Sonnet |
| SSTI Hunter | Server-side template injection (Jinja2, Freemarker, Velocity, Pug) | Sonnet |
| Command Injection Hunter | OS command execution via user input, OOB callback verification | Sonnet |
| Path Traversal Hunter | Directory traversal and local file inclusion | Sonnet |
| CORS Hunter | Origin reflection, null origin, subdomain wildcard, credential inclusion | Haiku |
| Host Header Hunter | Host header injection, cache poisoning, password-reset link manipulation | Haiku |
| Open Redirect Hunter | URL redirect chains, javascript:-scheme detection | Haiku |
| Prototype Pollution Hunter | JavaScript prototype chain manipulation, gadget chains | Sonnet |
| CRLF Hunter | HTTP header injection via carriage-return / line-feed | Haiku |
| HTTP Smuggling Hunter | Request smuggling (CL.TE, TE.CL, TE.TE) | Sonnet |
| XXE Hunter | XML external entity injection (direct + blind via OOB) | Sonnet |
| JWT Hunter | Algorithm confusion, alg=none, kid injection, jwk smuggling | Sonnet |
| SAML Hunter | Signature wrapping, comment injection, unsigned-assertion replay | Sonnet |
| WebSocket Hunter | Cross-Site WebSocket Hijacking (CSWSH), origin enforcement | Haiku |
| Race Condition Hunter | TOCTOU, double-spend, parallel-request races (20 concurrent probes) | Sonnet |
| Deserialization Hunter | Java serializable, Python pickle, PHP unserialize, .NET BinaryFormatter | Sonnet |
| Cache Hunter | Web cache poisoning, cache deception | Sonnet |
| Subdomain Takeover Hunter | Dangling DNS records, unclaimed cloud resources | Haiku |
| MFA Bypass Hunter | OTP / TOTP / WebAuthn bypass paths | Sonnet |
| Business Logic Hunter | Negative-quantity, zero-cost, currency manipulation patterns | Sonnet |
| Prompt Injection Hunter | LLM prompt injection in AI-powered features | Sonnet |
Tier assignments are enforced by COMPLEXITY_LOCKED_AGENTS in cost_router.ts for security-sensitive types — keyword-based "this looks complex, upgrade to Opus" inflation is blocked for them.
Defense in depth, with multiple independent enforcement layers:
safe_to_test.rs (1,468 LOC) parses HackerOne JSON scope and enforces a strict default-deny policy. Supports wildcard domains, CIDR notation (IPv4 + IPv6), IP ranges, port lists, and per-host port restrictions. Recent hardening: IPv4-mapped IPv6 normalization (::ffff:a.b.c.d no longer bypasses an IPv4 CIDR), full IDN punycode canonicalization, homoglyph differentiation. 41 dedicated tests with positive AND negative cases.
Before any state-changing command executes against a live target, an ApproveDenyModal presents the exact command, the requesting agent, the target, the safety category, and any warnings. The user can approve, deny, modify, or pause. Per-category auto-approve rules are available behind an explicit opt-in confirmation dialog. The approval promise has a 60-second timeout backed by an audit-trail log.
kill_switch.rs uses atomic state + file persistence with fsync. Activation broadcasts to all subscribers and calls Sandbox::destroy_all(). The fail-safe on a corrupted state file defaults to ACTIVE — the safest possible state.
pty_manager.rs uses CommandBuilder with explicit argv arrays — never shell string interpolation. Validates against dangerous characters (|, &, ;, $) and sanitizes environment variables. The training command path additionally locks Python invocations to scripts under allowed directories.
secure_storage.rs uses AES-256-GCM with HKDF key derivation and per-encryption random nonces. SettingsContext strips apiKeys before any localStorage.setItem(). Credentials are never written to disk in plaintext, never logged, and never included in error messages.
sandbox.rs creates Docker / Podman containers with ReadonlyRootfs, all capabilities dropped (only NET_RAW added), no-new-privileges, non-root user, 2 GB memory cap, 1 CPU, 256 PIDs. Every blocked-prefix and blocked-exact env var is explicitly tested. Tinyproxy enforces scope on shell tools (curl, wget, git, pip, etc.).
proxy_pool.rs supports HTTP, HTTPS, and SOCKS5 with round-robin / random / least-recently-used / fastest-first strategies. Continuous health checking. Failed proxies evicted from the active pool. Poisoned-mutex recovery instead of process-wide panic.
All command executions are recorded in asciinema format. Decision logs, agent reasoning traces, and tool invocations are captured for post-session review and (opt-in) training-data collection.
| Component | Crate | Purpose |
|---|---|---|
| Desktop runtime | tauri 2.0 |
Native packaging, IPC, system integration |
| Async runtime | tokio |
Concurrent I/O |
| Cryptography | ring |
Key derivation, AES-GCM |
| HTTP client | reqwest |
Outbound requests with proxy support |
| Container API | bollard |
Docker / Podman sandbox management |
| Subprocess PTY | portable-pty |
Isolated execution with recording |
| Embedded SQL | rusqlite |
Knowledge graph + benchmark persistence |
| Errors | thiserror |
Typed hierarchies (anyhow only at binary entry) |
| Logging | tracing + tracing-subscriber |
Structured logs with spans |
| Component | Library | Purpose |
|---|---|---|
| UI framework | React 19 | Component architecture |
| Build tool | Vite 7 | Dev server + production bundle |
| Styling | Tailwind CSS 4 | Utility-first dark theme |
| Terminal | xterm.js (@xterm/xterm + addon-fit + addon-web-links) |
Embedded PTY view — full ANSI rendering for tool output (nmap, sqlmap, nuclei, ffuf, dalfox, gobuster), clickable URLs, fit-to-panel resize |
| Charts | Recharts | Benchmark + cost dashboards |
| Virtual scrolling | react-virtuoso | Large finding lists |
| Markdown | react-markdown | Report rendering |
| Testing | Vitest + Testing Library | Unit + integration |
| Component | Technology | Purpose |
|---|---|---|
| Orchestrator | Anthropic Claude Opus 4.6 | Coordination, planning, synthesis |
| Specialist agents | Claude Sonnet 4.6 / Haiku 4.5 | Per-tier agent reasoning |
| Vector database | Qdrant | Semantic dedup, technique recall |
| Browser automation | Playwright | XSS dialog detection, JS-rendered crawl |
| OOB infrastructure | interactsh + Burp Collaborator + DNS canary | Blind SSRF / XXE / RCE confirmation |
| H1 integration | HackerOne REST API | Program import, report submission |
| Requirement | Minimum | Notes |
|---|---|---|
| OS | Linux (Kali tested) | macOS and Windows supported via Tauri |
| Node.js | 18+ | Frontend build |
| Rust | 1.70+ stable | Backend compilation |
| Docker | 20+ | Qdrant + sandbox containers |
| Python | 3.10+ | Optional — only for the experimental training pipeline |
| NVIDIA GPU (24 GB+ VRAM) | — | Optional — only for local LoRA fine-tuning |
git clone https://github.com/JBWolfFlow/Huntress.git
cd Huntress
# One-time bootstrap (installs system tooling, sets up directories)
chmod +x scripts/setup.sh
./scripts/setup.sh
# Node dependencies
npm install
# Start Qdrant in the background
docker compose up -d
# Launch the development build
npm run tauri devnpm run tauri build
# Binary: src-tauri/target/release/
# Installers: src-tauri/target/release/bundle/{deb,AppImage,dmg,msi}/chmod +x scripts/install_security_tools.sh
./scripts/install_security_tools.shThis installs nmap, sqlmap, gobuster, ffuf, nuclei, subfinder, httpx, dnsx, wafw00f, and the rest of the agent toolkit.
The setup wizard collects:
- AI provider — pick from Anthropic / OpenAI / Google / OpenRouter / Local (Ollama).
- API key — written to the OS keychain via Tauri's secure-storage abstraction; never to disk in plaintext.
- Per-agent model overrides (optional) — defaults are cost-tier-optimized.
- HackerOne API token (optional) — required only for direct report submission.
All settings persist between sessions and can be edited from the Settings panel.
# AI providers (at least one required)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_AI_API_KEY=...
OPENROUTER_API_KEY=...
# HackerOne (optional)
HACKERONE_API_TOKEN=...
# Vector DB
QDRANT_URL=http://localhost:6333
# Experimental training pipeline (off by default)
EXPERIMENTAL_TRAINING=1
HTB_API_TOKEN=...
HUGGINGFACE_TOKEN=...Standard HackerOne JSON:
{
"targets": {
"in_scope": [
{ "asset_identifier": "*.example.com", "asset_type": "URL", "eligible_for_bounty": true },
{ "asset_identifier": "192.0.2.0/24", "asset_type": "CIDR", "eligible_for_bounty": true }
],
"out_of_scope": [
{ "asset_identifier": "admin.example.com", "asset_type": "URL" }
]
}
}Wildcards, CIDR (IPv4 + IPv6), IP ranges, port specifications, and IDN domains are fully supported. The scope validator normalizes IPv4-mapped IPv6 (::ffff:a.b.c.d) and IDN/punycode forms so neither can be used as a bypass.
# Core services
docker compose up -d
# Add the OWASP Juice Shop testing target
docker compose --profile testing up -d| Service | Port | Purpose |
|---|---|---|
| Qdrant (REST) | 6333 | Vector database |
| Qdrant (gRPC) | 6334 | High-performance vector interface |
| Juice Shop | 3001 | Local testing target (testing profile only) |
- Import a program. New Hunt → paste a HackerOne URL, drop in a scope JSON, or enter scope manually. The orchestrator scrapes the program page (or reads the JSON), parses scope and rules, and produces a structured briefing.
- Choose a strategy. The orchestrator presents 3–5 ranked attack strategies based on asset types, historical bounty data, and detected technologies. Pick one or type a custom instruction.
- Watch the dispatch. The Agent Status panel shows live agent state. The chat displays findings, agent reasoning, and orchestrator updates as they happen.
- Approve commands. Active-test commands surface in an
ApproveDenyModalwith full context. Approve, deny, modify, or pause. - Triage findings. Confirmed findings appear with severity badges, validation status, duplicate-check results, and quality scores. Drill into any finding for full evidence and HTTP exchanges.
- Submit reports. For confirmed findings, the orchestrator generates a HackerOne-shape PoC report. Review in the Report Editor, edit, and submit through the integrated H1 API.
- Standard mode — every active-test command requires explicit approval.
- Auto-approve (per category) — passive recon auto-approved, active testing still gated.
- Economy mode — Settings → Advanced → Hunt Behavior caps cost per finding.
- Headless / CLI — for benchmark runs and unattended operations; see
npm run benchmark:xbowandscripts/htb_runner.py.
npm run tauri dev # dev server with hot reload
npm run tauri build # production binary + installers
npm run lint # tsc + eslint + prettier
cd src-tauri && cargo clippy -- -D warnings
cd src-tauri && cargo fmtRust (src-tauri/src/)
thiserrorfor typed errors;anyhowonly at binary entry points.- Exhaustive enum matching — no wildcard
_on enums that may grow. Arc<Mutex<T>>for shared state with minimal lock duration.tracingfor structured logging with spans.- Every Tauri command validates input before processing.
- Mutex poisoning is recovered via
into_inner(), not.expect()(no process-wide panics from unrelated thread failures).
TypeScript (src/)
- Strict mode; no implicit
any. - Interfaces over type aliases for extensible object shapes.
async/awaitexclusively — no raw.then()chains.- Functional React with hooks only.
- Every
invoke()call has a typed command/response pair.
Command execution
- argv arrays, never template-literal shell strings.
- Null-byte-joined wire format:
['cmd', 'arg1'].join('\x00'). - Every command passes through scope validation and the approval gate.
npm test # full TypeScript suite (Vitest)
npm run test:watch # watch mode
npm run test:coverage # with coverage reporting
npm run test:live # integration tests (needs running services)
cd src-tauri && cargo test # Rust suite| Suite | Tests | Status |
|---|---|---|
| TypeScript | 2,721 passing / 18 skipped / 0 failing | across 121 test files |
| Rust | 143 passing / 0 failing / 1 ignored | unit + integration |
tsc --noEmit --skipLibCheck |
clean | |
cargo clippy --all-targets -- -D warnings |
clean |
| Category | What it covers | Config |
|---|---|---|
| Unit | Individual modules in isolation | vitest.config.ts (30 s timeout) |
| Integration | Multi-module + external service interactions | vitest.integration.config.ts (120 s timeout) |
| Agent fleet | Every agent initializes, dispatches, and reports correctly | agent_fleet.test.ts |
| Security | Scope-deny paths, kill-switch fail-safe, approval-pipeline rejection | Multiple files |
| Validators | Each of 66 validators with positive AND negative cases | *_validators.test.ts, v16_h1_validators.test.ts, wave7_b*.test.ts |
| Provider | API-key validation, streaming, fallback chains | provider_fallback.test.ts |
| CIDR / IDN | IPv4-mapped IPv6, homoglyph, edge prefixes (/0, /32, /128) |
safe_to_test.rs |
| SimHash | 64-bit entropy invariants, near-dup grouping | finding_dedup.test.ts |
| Compose / readiness | YAML regex strictness, HTTPS-aware probe parallelism | v17_*.test.ts |
huntress/
├── src/ # Frontend (React / TypeScript)
│ ├── agents/ # 27 specialist agents + auth + recon
│ │ ├── oauth/ # 4 OAuth sub-modules + discovery
│ │ ├── base_agent.ts # Abstract base + finding types
│ │ ├── agent_catalog.ts # Registry
│ │ ├── agent_router.ts # Selection + dispatch
│ │ └── standardized_agents.ts # Self-registration trigger
│ ├── components/ # React UI surfaces
│ │ ├── ChatInterface.tsx # Primary interaction
│ │ ├── ApproveDenyModal.tsx # Human approval gate
│ │ ├── ReportReviewModal.tsx # Submission gate
│ │ └── ...
│ ├── core/
│ │ ├── orchestrator/ # Coordinator engine, dispatch, dedup
│ │ ├── engine/ # ReAct loop, tool schemas, chain summarizer
│ │ ├── providers/ # ModelProvider abstractions
│ │ ├── reporting/ # PoC generation, templates, H1 API
│ │ ├── validation/ # 27 deterministic validators + OOB server
│ │ ├── http/ # Request engine, scope check, rate control
│ │ ├── memory/ # Qdrant integration, hunt history
│ │ ├── benchmark/ # XBOW runner, persistence, scoring
│ │ ├── auth/ # Session manager, token refresh
│ │ ├── discovery/ # Crawler, JS analyzer, schema parser
│ │ └── ...
│ └── tests/ # 120 test files
├── src-tauri/ # Backend (Rust / Tauri 2.0)
│ └── src/
│ ├── lib.rs # 50+ Tauri commands, module integration
│ ├── safe_to_test.rs # Scope validator (1,468 LOC, 49 tests)
│ ├── pty_manager.rs # Secure subprocess execution
│ ├── kill_switch.rs # Emergency shutdown with persistence
│ ├── proxy_pool.rs # HTTP/HTTPS/SOCKS5 rotation
│ ├── secure_storage.rs # OS keychain integration
│ ├── sandbox.rs # Container isolation
│ ├── agent_browser.rs # Playwright Node subprocess manager
│ ├── h1_api.rs # HackerOne REST client
│ └── tool_checker.rs # Security tool availability checks
├── scripts/ # Automation
│ ├── setup.sh # Bootstrap
│ ├── install_security_tools.sh # Tool installer
│ ├── htb_runner.py # HackTheBox training (experimental)
│ └── deploy_production.sh # Gradual model deployment
├── docker-compose.yml # Qdrant + testing services
├── PIPELINE.md # Single source of truth for open work
└── README.md # This file
| Metric | Value |
|---|---|
| TypeScript / TSX source files | 330 |
| TypeScript LOC (approx) | ~89,000 |
| Rust source files | 11 |
| Rust LOC | ~7,300 |
| Specialist hunting agents | 27 (+ 2 utility: Auth Worker, Recon) |
| OAuth sub-modules | 4 |
| Validator registrations | 51 (across 47 unique vulnerability types) |
| HackerOne report templates | 30 |
| React components | 19 |
| Tauri IPC commands | 53 (across 11 Rust files) |
| TypeScript tests | 2,721 |
| Rust tests | 143 |
This section is intentionally explicit. Marketing-style claims that paper over real gaps make the platform less useful, not more.
| Gap | What it means | Status |
|---|---|---|
| Zero triaged HackerOne submissions | We have not yet had a Huntress-generated report accepted, marked duplicate, marked informative, or marked N/A on a real H1 program. Every quality claim about reports is therefore unproven against live triage criteria. | Open milestone — first submission is the next major step. |
| Benchmark cohort coverage uneven | The v18 38/103 result is the first full-corpus completion, but XSS (4%) and SSRF (0/3) remain capability gaps. L3 challenges (1/8 = 13%) are barely sampled at this corpus size. | Targeted prompt + browser-flow work for XSS hunter; next benchmark batch. |
| Quality scorer not validated against H1 | report_quality.ts produces letter grades, but those grades have never been compared to real H1 triage outcomes. The scorer reflects what we think a good H1 report looks like, not what triagers actually accept. |
Will be calibrated once submissions accumulate. |
| Training pipeline is experimental | The Axolotl + LoRA local fine-tuning path requires a 24 GB+ GPU and has not been validated end-to-end. Behind the EXPERIMENTAL_TRAINING feature flag, excluded from the default test run. |
Research preview only. |
| God-object refactors deferred | react_loop.ts (~2.5K LOC), orchestrator_engine.ts (~3.4K LOC), and validator.ts (~5.3K LOC) are large. Concrete pain hasn't appeared yet, so refactoring is deferred rather than premature. |
Tracked but not blocking. |
The single source of truth for open and closed work is PIPELINE.md.
Huntress is for authorized security testing only. This includes:
- Bug bounty programs with explicit, written authorization (HackerOne, Bugcrowd, Intigriti, etc.).
- Penetration-testing engagements with signed scope agreements.
- Security research on systems you own or have written permission to test.
- Educational use in controlled environments (CTFs, deliberately vulnerable VMs, your own lab).
Users are solely responsible for ensuring they have proper authorization before testing any target. Unauthorized access to computer systems is illegal under the Computer Fraud and Abuse Act (CFAA) in the United States and equivalent legislation in other jurisdictions. The authors and contributors assume no liability for misuse.
The default-deny scope validator is the first line of defense. The human approval gate is the second. The kill switch is the third. None of these substitute for the operator confirming, in writing, that they have authorization for every target they configure.
MIT — see LICENSE for the full text.
- Tauri — desktop application framework
- Anthropic — Claude AI models powering the orchestrator + specialist fleet
- Qdrant — vector database
- Playwright — browser automation for validation + JS-rendered crawl
- Project Discovery —
interactsh,nuclei,subfinder,httpx - XBOW — the public Validation Benchmark used as our primary eval harness
- HackerOne — bug bounty platform and API
- HackTheBox — training environment for the experimental learning pipeline
Built and maintained by JBWolfFlow under NeuroForge Technologies.