Huntress

AI-driven bug bounty automation for HackerOne — research preview

Huntress is a Tauri 2.0 desktop application that automates HackerOne bug bounty hunting through a coordinated fleet of AI agents. A primary orchestrator (Claude Opus 4.6) ingests a program's scope, plans an attack strategy, and dispatches 27 specialized vulnerability-hunting agents — each running its own ReAct loop on a cost-tiered model (Haiku or Sonnet). All target interactions pass through a Rust scope validator, a default-deny security model, and a human approval gate.

This is a research preview, not a production tool. The platform infrastructure is solid and well-tested, but no Huntress-generated report has been triaged on a real HackerOne program yet — that is the next milestone, not a past achievement. Use accordingly.

Current Status

Indicator	Value	Source
XBOW benchmark	38 / 103 challenges solved (36.89%)	v18 second published run, 2026-05-09 (commit `6e8450e`) — 52% L1, 27% L2, 13% L3; $58.05; 5h33m; Sonnet 4.6
HackerOne triaged submissions	0	First submission is an open milestone
TypeScript tests	3,857 passing / 18 skipped / 0 failing	`npx vitest run` (169 files)
Rust tests	159 passing / 0 failing / 1 ignored	`cd src-tauri && cargo test`
Type / lint	`tsc --noEmit` clean · `cargo clippy --all-targets -- -D warnings` clean	CI
Open audit findings	0	All 9 BLOCKERs, 7 HIGH-severity, and 8 MED-severity items closed across v15–v17
Hunt history	12 sessions (6 OWASP Juice Shop training + 6 real-world H1 programs)	Internal logs

The XBOW number above is the v18 second published run — the first full 103-challenge completion (v13.3 aborted at 55/103 by credit exhaustion). L2 jumped from 2% → 27% and L3 from 0% → 13% versus v13.3. XSS remains the largest capability gap at 4% solve rate; see PIPELINE.md §13 for the full per-category breakdown.

What changed recently

Wave 7 (2026-05-18 — 2026-05-19) — Anthropic CVP integration: official Cyber Verification Program approval received 2026-05-13 for org 64d13134-7083-4aca-a808-fb16a2d6b9d0 (dual-use cybersecurity activities, with mass-exfil + ransomware as enforced carve-outs). 9 commits shipped: A7 CVP-aware authorization preamble + A1a centralized payload library (14 carve-out safety gates) + A1b orchestrator wires cvpStatus end-to-end + A1c-A1e per-agent payload-library migrations across 8 hunter agents (xss / sqli / ssrf / ssti / cmd_injection / proto_pollution / deserialization / path_traversal) + B1+B2 SSO realm 3-state oracle + cross-realm token validators + B3+B4+B7+B8+B9 deterministic validator bundle (uuidv1_leak / cors_method_misadvertised / auth_bypass_headers / http_verb_confusion / admin_debug_paths — engagement-derived from Quixel SSO + KWS findings) + C5 adaptive agent dispatch driven by Phase 0.5 disclosed-report dedup analysis (avoid-list agents dropped, underserved-list agents priority-boosted). Set HUNTRESS_CVP_GRANTED=1 + HUNTRESS_CVP_ORG_ID=<approved-org-id> in env to activate aggressive payloads across all 8 migrated agents. Default behavior unchanged when env unset (back-compat).
Wave 6 (2026-05-17 — 2026-05-18) — engagement-derived items (10 commits): program_selector saturation/asset-type scoring; Phase 0.5 disclosed-report dedup analysis library; orchestrator EV checkpoint wiring; Salesforce Aura RPC validator; auth_param_validation validator; SPA bundle endpoint extraction; acquisition / shared-infra detector; Subzy false-positive filter; WAF bypass budget cap (20 min per target); mobile APK endpoint extractor (apktool + strings + scoped grep + redacted-credentials).
v17 (May 2026) — closed every remaining MED audit finding: IPv4-mapped-IPv6 CIDR bypass, IDN homoglyph regression guards, a real 32-bit aliasing bug in the duplicate-checker SimHash, severity-predictor calibration staleness warnings, HTTPS-aware readiness probe (http:// and https:// probed in parallel), condition: service_healthy regex hardening with lookahead anchor, and 13 new Rust tests for sandbox + agent_browser invariants.
v16 (May 2026) — closed all remaining HIGH audit findings: 9 deterministic validators for previously pass-through vulnerability types (CSRF, OAuth redirect_uri, info disclosure, rate-limit bypass, CRLF, blind boolean SQLi, MFA bypass, SAML, WebSocket); 20 new HackerOne-shape report templates; chain-summarizer evidence preservation (FLAG, secrets, endpoints survive context pruning); open-redirect validator hardening (javascript:-scheme + protocol-relative //host detection).
v15 (May 2026) — closed all 9 BLOCKERs from a comprehensive program audit: dynamic Docker compose service discovery, expanded body-snippet capture (15 KB), production submission pathway wired to the report-template system, broken-access validator now requires differential proof.

A single source of truth for all open and closed work lives in PIPELINE.md. A research-derived backlog distilled from the team's accumulated bug-bounty methodology corpus is summarized at PIPELINE.md §3.6 (P1-6) and tracked in detail in the project's Obsidian vault notes (developer-local).

Overview

Huntress operates on a coordinator-solver architecture:

The OrchestratorEngine (Claude Opus 4.6 by default) ingests a HackerOne bounty program's scope, rules, asset list, and bounty table; produces a ranked attack plan; and delegates work to specialist agents via a fire-and-forget dispatch pattern (5 agents in flight, the rest queued).
Specialist agents run a ReAct loop (Reason + Act) on a tier-appropriate model — Haiku for cheap, deterministic checks (recon, CORS headers, cache, subdomain takeover); Sonnet for harder reasoning (SQLi, XSS, SSRF, IDOR, OAuth, JWT). Tier assignments are locked for security-sensitive agents and cannot be raised by keyword inflation.
A Rust scope validator (safe_to_test.rs, 1,468 LOC) is the single chokepoint for "is this target legal to touch" decisions. Default-deny. Wildcard, CIDR, IPv6, IP-range, and port-list support. 49 dedicated tests.
A human approval gate sits in front of every state-changing command, with per-category auto-approve rules available for users who want to run without per-command intervention.
A kill switch (kill_switch.rs) provides single-keystroke shutdown, persists across crashes, fail-safes to ACTIVE on corrupted state, and tears down all sandbox containers on activation.

The platform supports any combination of providers — Anthropic, OpenAI, Google, OpenRouter, and local Ollama — through a common ModelProvider interface. The default and recommended configuration uses Anthropic Claude exclusively.

Design principles

Principle	Implementation
User control at every step	Approval gate sits in front of every active-test command; rejection unwinds the agent's plan.
Default-deny security posture	Out-of-scope targets, unvalidated commands, and unhealthy proxies are blocked rather than passed through.
Multi-model by design	Orchestrator and per-agent models swap with zero code changes; provider failure triggers fallback.
Cost-aware	Tiered routing keeps simple agents on Haiku; budget enforcement at 90% (warn) and 100% (hard stop).
Desktop-native	Ships as a single double-click app on Linux, macOS, and Windows via Tauri 2.0 — not a CLI.

Benchmark Results

XBOW Validation Benchmark (104 Docker CTF challenges)

The XBOW Validation Benchmark is a public set of 104 dockerized vulnerability challenges with known flags. It is the closest thing the AI-bug-bounty space has to a standardized eval.

Run	Score	Date	Commit	Notes
v18	38 / 103 (36.89%)	2026-05-09	`6e8450e`	Second published run, first full 103-challenge completion (5h33m, $58.05, Sonnet 4.6). 52% L1 / 27% L2 / 13% L3.
v13.3	22 / 103 (21.4%)	2026-05-07	`f479f56`	First published full-corpus run; aborted at 55/103 by credit exhaustion.

What the 21.4% means: for each of 103 challenges, a Sonnet-tier specialist agent (selected by attack-tag heuristics) was given 40 ReAct iterations and ~10 minutes of wall-clock time to extract a FLAG{...} token from a freshly-built Docker container. Solved = the agent's reported flag matched the build-arg-injected UUID, OR (when our build-arg injection couldn't be verified) any well-formed FLAG{...} the agent extracted from the running container.

What it does not measure: real-world HackerOne triage outcomes. XBOW challenges are CTFs with known winning paths; HackerOne programs are open-ended targets where finding a valid bug is half the work and writing a triagable report is the other half. A high XBOW score is necessary but not sufficient for first-submission success.

The full per-challenge breakdown, agent dispatch traces, and SQLite-persisted run history are stored under ~/.local/share/huntress/benchmarks/xbow/ after a run completes.

Reproducing the benchmark

# Clone the XBOW benchmark suite (one-time, ~2 GB)
git clone https://github.com/xbow-engineering/validation-benchmarks ~/xbow-benchmark

# Run the full 104-challenge sweep (cost: ~$70-100 in Anthropic API spend, ~6h wall clock)
npm run benchmark:xbow -- --benchmark-dir ~/xbow-benchmark --model claude-sonnet-4-6

Per-challenge timeout, parallelism, and iteration cap are configurable via BenchmarkConfig. Each attempt streams JSONL traces to ~/.local/share/huntress/benchmarks/xbow/<run-id>/<challenge>.jsonl for post-hoc inspection.

Architecture

+-----------------------------------------------------------------------+
|                       Huntress Desktop Application                    |
+-----------------------------------------------------------------------+
|                                                                       |
|   +---------------------------------------------------------------+   |
|   |                Frontend (React 19 / TypeScript)               |   |
|   |   Chat | Agent Status | Findings | Reports | Settings | PTY  |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|                            Tauri IPC Bridge                           |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                  Backend (Rust / Tauri 2.0)                   |   |
|   |   Scope Validator | PTY Manager | Kill Switch | Sandbox Mgr  |   |
|   |   Secure Storage  | Proxy Pool  | H1 API      | Agent Browser |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |             AI Orchestration Layer (TypeScript)               |   |
|   |   OrchestratorEngine | AgentRouter | ReAct Loop | Cost Router |   |
|   |   Adviser | Refiner  | Toolcall_fixer | Chain Summarizer     |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                    Specialist Agent Fleet (27)                |   |
|   |   XSS · SQLi · SSRF · IDOR · OAuth · JWT · SSTI · XXE · ...  |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                          Data Layer                           |   |
|   |   Qdrant (vectors) | SQLite (knowledge graph + benchmark)     |   |
|   |   OS Keychain (secrets) | Asciinema (audit trail)             |   |
|   +---------------------------------------------------------------+   |
|                                                                       |
+-----------------------------------------------------------------------+

The dispatch loop

The orchestrator builds an ordered task list from the program's scope and attack-surface analysis.
dispatchAgent() fires up to 5 agents in parallel and returns immediately — finding deduplication, validation, and H1 duplicate-check all run fire-and-forget.
Each agent's ReAct loop iterates: think → call a tool → observe → update plan, with a hard cap (40 iterations by default) and a chain summarizer that pages out old context while preserving high-value evidence (flags, secrets, discovered endpoints).
Findings flow through 27 deterministic validators (active probes, not LLM grading) before the user ever sees them. Confidence scores are adjusted based on real server evidence — there is no auto-confirm-on-agent-claim shortcut.

Multi-model providers

ModelProvider (interface)
├── AnthropicProvider   (Claude Opus 4.x, Sonnet 4.x, Haiku 4.x)
├── OpenAIProvider      (GPT-4o, GPT-4o-mini, o3)
├── GoogleProvider      (Gemini 2.5 Pro, Gemini Flash)
├── OpenRouterProvider  (any model exposed via OpenRouter)
└── LocalProvider       (Ollama — Llama, Mistral, Qwen, etc.)

The default configuration uses Anthropic Claude exclusively because the team only validates against Anthropic models. Switching providers requires only a Settings change; no code edit.

Capabilities

Vulnerability hunting

27 specialist agents (+ 2 utility agents: Auth Worker and Recon) covering OWASP Top 10 plus emerging classes (prototype pollution, prompt injection, SAML signature wrapping, HTTP smuggling, race conditions).
66 validator registrations across 62 unique vulnerability types in validator.ts — most are deterministic active probes (XSS uses Playwright dialog detection, SQLi re-executes payloads with timing diff, SSRF + XXE + command injection use OOB callbacks via interactsh, CSRF validates by replaying with attacker Origin, 4 OAuth deterministic types, cache poisoning 3-step proof, JWT none + alg-confusion forgery, 20-concurrent race condition with distribution analysis, Wave 7 added SSO realm 3-state oracle + cross-realm token confused-deputy + UUIDv1 leak + CORS method-set misadvertisement + auth-bypass-headers + HTTP-verb-confusion + admin-debug-paths). A small set is intentionally pass-through where deterministic verification is provably impossible (subdomain_takeover heuristic-only, csrf and the stateful blind-boolean family require custom scaffolding).
Active probing only: every confirmed finding has hard server evidence (a triggered alert, a leaked secret, a successful injection with a 2xx); the agent's confidence alone never produces confirmed=true.
Cross-agent knowledge sharing via a Blackboard pattern — the IDOR hunter learns about the JWT structure the auth-worker discovered without re-fetching it.
WAF awareness — agents receive vendor-specific bypass strategies via injected WafContext.

Reporting and submission

HackerOne-shape report templates for 30 vulnerability types covering Summary, Vulnerability Details, Prerequisites, Steps to Reproduce, HTTP Evidence, Expected vs Actual, Proof of Concept, Impact, Affected Scope, and Remediation.
Real CVSS 3.1 calculator wired into report generation; vector strings included.
Duplicate detection against H1 hacktivity (verified live), GitHub Security Advisories, and the local Qdrant memory; uses 64-bit FNV-1a SimHash with proper bigint shifts (after a v17 fix to a previous 32-bit-aliasing bug).
Severity prediction with calibration-staleness warnings — the predictor flags when its 2025 industry-average baseline has aged past 365 days and historical data is sparse.
HackerOne API integration with attachment upload, draft preview, and a ReportReviewModal submission gate that blocks duplicate-skip, F-grade quality scores, missing descriptions, or insufficient reproduction steps.

Reconnaissance and discovery

JS-rendered crawler (opt-in via useHeadlessBrowser: true on CrawlConfig) that drives a real browser via Playwright; discovers SPA endpoints HTTP-only crawlers miss (Angular / React / Vue). Default is HTTP-only for back-compat; flip the flag when targeting JS-heavy applications.
OpenAPI / Swagger / GraphQL schema parsing → endpoint catalog → task generation.
Subdomain enumeration, technology fingerprinting, parameter mining, and Nuclei template integration for known-vulnerability scanning.

Operational controls

Kill switch — atomic, persistent, fail-safe to ACTIVE; tears down sandbox containers on activation.
Approval gate with 60-second timeout, audit trail logging, and per-category auto-approve rules.
Adaptive rate controller per-domain, with WAF detection.
Stealth headers — 19 user-agent profiles, randomized.
Proxy rotation with health checking; failed proxies evicted automatically; poisoned-mutex recovery in the global pool.

Agent Fleet

Each agent self-registers via registerAgent() at module import time. Registration is centralized in src/agents/standardized_agents.ts.

Agent	Vulnerability class	Tier
Recon Agent	Subdomain enumeration, tech fingerprinting, endpoint discovery	Haiku
Auth Worker Agent	Login flow detection, session acquisition, token refresh	Sonnet
OAuth (4 sub-modules)	redirect_uri override, missing state, PKCE downgrade, scope escalation	Sonnet
SSRF Hunter	Server-side request forgery, internal service access	Sonnet
XSS Hunter	Reflected, stored, and DOM-based cross-site scripting	Sonnet
SQLi Hunter	Error-based, blind boolean, blind time-based across multiple engines	Sonnet
NoSQL Hunter	NoSQL injection (MongoDB, CouchDB, etc.)	Sonnet
GraphQL Hunter	Introspection, batching, nested query depth	Haiku
IDOR Hunter	Insecure direct object references, BOLA, two-account access proof	Sonnet
SSTI Hunter	Server-side template injection (Jinja2, Freemarker, Velocity, Pug)	Sonnet
Command Injection Hunter	OS command execution via user input, OOB callback verification	Sonnet
Path Traversal Hunter	Directory traversal and local file inclusion	Sonnet
CORS Hunter	Origin reflection, null origin, subdomain wildcard, credential inclusion	Haiku
Host Header Hunter	Host header injection, cache poisoning, password-reset link manipulation	Haiku
Open Redirect Hunter	URL redirect chains, javascript:-scheme detection	Haiku
Prototype Pollution Hunter	JavaScript prototype chain manipulation, gadget chains	Sonnet
CRLF Hunter	HTTP header injection via carriage-return / line-feed	Haiku
HTTP Smuggling Hunter	Request smuggling (CL.TE, TE.CL, TE.TE)	Sonnet
XXE Hunter	XML external entity injection (direct + blind via OOB)	Sonnet
JWT Hunter	Algorithm confusion, alg=none, kid injection, jwk smuggling	Sonnet
SAML Hunter	Signature wrapping, comment injection, unsigned-assertion replay	Sonnet
WebSocket Hunter	Cross-Site WebSocket Hijacking (CSWSH), origin enforcement	Haiku
Race Condition Hunter	TOCTOU, double-spend, parallel-request races (20 concurrent probes)	Sonnet
Deserialization Hunter	Java serializable, Python pickle, PHP unserialize, .NET BinaryFormatter	Sonnet
Cache Hunter	Web cache poisoning, cache deception	Sonnet
Subdomain Takeover Hunter	Dangling DNS records, unclaimed cloud resources	Haiku
MFA Bypass Hunter	OTP / TOTP / WebAuthn bypass paths	Sonnet
Business Logic Hunter	Negative-quantity, zero-cost, currency manipulation patterns	Sonnet
Prompt Injection Hunter	LLM prompt injection in AI-powered features	Sonnet

Tier assignments are enforced by COMPLEXITY_LOCKED_AGENTS in cost_router.ts for security-sensitive types — keyword-based "this looks complex, upgrade to Opus" inflation is blocked for them.

Security Model

Defense in depth, with multiple independent enforcement layers:

Scope validation engine

safe_to_test.rs (1,468 LOC) parses HackerOne JSON scope and enforces a strict default-deny policy. Supports wildcard domains, CIDR notation (IPv4 + IPv6), IP ranges, port lists, and per-host port restrictions. Recent hardening: IPv4-mapped IPv6 normalization (::ffff:a.b.c.d no longer bypasses an IPv4 CIDR), full IDN punycode canonicalization, homoglyph differentiation. 41 dedicated tests with positive AND negative cases.

Human approval gate

Before any state-changing command executes against a live target, an ApproveDenyModal presents the exact command, the requesting agent, the target, the safety category, and any warnings. The user can approve, deny, modify, or pause. Per-category auto-approve rules are available behind an explicit opt-in confirmation dialog. The approval promise has a 60-second timeout backed by an audit-trail log.

Kill switch

kill_switch.rs uses atomic state + file persistence with fsync. Activation broadcasts to all subscribers and calls Sandbox::destroy_all(). The fail-safe on a corrupted state file defaults to ACTIVE — the safest possible state.

Command execution

pty_manager.rs uses CommandBuilder with explicit argv arrays — never shell string interpolation. Validates against dangerous characters (|, &, ;, $) and sanitizes environment variables. The training command path additionally locks Python invocations to scripts under allowed directories.

Secure credential storage

secure_storage.rs uses AES-256-GCM with HKDF key derivation and per-encryption random nonces. SettingsContext strips apiKeys before any localStorage.setItem(). Credentials are never written to disk in plaintext, never logged, and never included in error messages.

Sandbox isolation

sandbox.rs creates Docker / Podman containers with ReadonlyRootfs, all capabilities dropped (only NET_RAW added), no-new-privileges, non-root user, 2 GB memory cap, 1 CPU, 256 PIDs. Every blocked-prefix and blocked-exact env var is explicitly tested. Tinyproxy enforces scope on shell tools (curl, wget, git, pip, etc.).

Proxy rotation

proxy_pool.rs supports HTTP, HTTPS, and SOCKS5 with round-robin / random / least-recently-used / fastest-first strategies. Continuous health checking. Failed proxies evicted from the active pool. Poisoned-mutex recovery instead of process-wide panic.

Audit trail

All command executions are recorded in asciinema format. Decision logs, agent reasoning traces, and tool invocations are captured for post-session review and (opt-in) training-data collection.

Technology Stack

Backend (Rust 1.70+)

Component	Crate	Purpose
Desktop runtime	`tauri 2.0`	Native packaging, IPC, system integration
Async runtime	`tokio`	Concurrent I/O
Cryptography	`ring`	Key derivation, AES-GCM
HTTP client	`reqwest`	Outbound requests with proxy support
Container API	`bollard`	Docker / Podman sandbox management
Subprocess PTY	`portable-pty`	Isolated execution with recording
Embedded SQL	`rusqlite`	Knowledge graph + benchmark persistence
Errors	`thiserror`	Typed hierarchies (`anyhow` only at binary entry)
Logging	`tracing` + `tracing-subscriber`	Structured logs with spans

Frontend (TypeScript 5.8 strict)

Component	Library	Purpose
UI framework	React 19	Component architecture
Build tool	Vite 7	Dev server + production bundle
Styling	Tailwind CSS 4	Utility-first dark theme
Terminal	xterm.js (`@xterm/xterm` + `addon-fit` + `addon-web-links`)	Embedded PTY view — full ANSI rendering for tool output (nmap, sqlmap, nuclei, ffuf, dalfox, gobuster), clickable URLs, fit-to-panel resize
Charts	Recharts	Benchmark + cost dashboards
Virtual scrolling	react-virtuoso	Large finding lists
Markdown	react-markdown	Report rendering
Testing	Vitest + Testing Library	Unit + integration

AI and data

Component	Technology	Purpose
Orchestrator	Anthropic Claude Opus 4.6	Coordination, planning, synthesis
Specialist agents	Claude Sonnet 4.6 / Haiku 4.5	Per-tier agent reasoning
Vector database	Qdrant	Semantic dedup, technique recall
Browser automation	Playwright	XSS dialog detection, JS-rendered crawl
OOB infrastructure	interactsh + Burp Collaborator + DNS canary	Blind SSRF / XXE / RCE confirmation
H1 integration	HackerOne REST API	Program import, report submission

Installation

Prerequisites

Requirement	Minimum	Notes
OS	Linux (Kali tested)	macOS and Windows supported via Tauri
Node.js	18+	Frontend build
Rust	1.70+ stable	Backend compilation
Docker	20+	Qdrant + sandbox containers
Python	3.10+	Optional — only for the experimental training pipeline
NVIDIA GPU (24 GB+ VRAM)	—	Optional — only for local LoRA fine-tuning

Quick start

git clone https://github.com/JBWolfFlow/Huntress.git
cd Huntress

# One-time bootstrap (installs system tooling, sets up directories)
chmod +x scripts/setup.sh
./scripts/setup.sh

# Node dependencies
npm install

# Start Qdrant in the background
docker compose up -d

# Launch the development build
npm run tauri dev

Production build

npm run tauri build
# Binary:    src-tauri/target/release/
# Installers: src-tauri/target/release/bundle/{deb,AppImage,dmg,msi}/

Optional: install the security tooling suite

chmod +x scripts/install_security_tools.sh
./scripts/install_security_tools.sh

This installs nmap, sqlmap, gobuster, ffuf, nuclei, subfinder, httpx, dnsx, wafw00f, and the rest of the agent toolkit.

Configuration

First-run setup

The setup wizard collects:

AI provider — pick from Anthropic / OpenAI / Google / OpenRouter / Local (Ollama).
API key — written to the OS keychain via Tauri's secure-storage abstraction; never to disk in plaintext.
Per-agent model overrides (optional) — defaults are cost-tier-optimized.
HackerOne API token (optional) — required only for direct report submission.

All settings persist between sessions and can be edited from the Settings panel.

Environment variables

# AI providers (at least one required)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_AI_API_KEY=...
OPENROUTER_API_KEY=...

# HackerOne (optional)
HACKERONE_API_TOKEN=...

# Vector DB
QDRANT_URL=http://localhost:6333

# Experimental training pipeline (off by default)
EXPERIMENTAL_TRAINING=1
HTB_API_TOKEN=...
HUGGINGFACE_TOKEN=...

Scope file format

Standard HackerOne JSON:

{
  "targets": {
    "in_scope": [
      { "asset_identifier": "*.example.com", "asset_type": "URL", "eligible_for_bounty": true },
      { "asset_identifier": "192.0.2.0/24",  "asset_type": "CIDR", "eligible_for_bounty": true }
    ],
    "out_of_scope": [
      { "asset_identifier": "admin.example.com", "asset_type": "URL" }
    ]
  }
}

Wildcards, CIDR (IPv4 + IPv6), IP ranges, port specifications, and IDN domains are fully supported. The scope validator normalizes IPv4-mapped IPv6 (::ffff:a.b.c.d) and IDN/punycode forms so neither can be used as a bypass.

Docker services

# Core services
docker compose up -d

# Add the OWASP Juice Shop testing target
docker compose --profile testing up -d

Service	Port	Purpose
Qdrant (REST)	6333	Vector database
Qdrant (gRPC)	6334	High-performance vector interface
Juice Shop	3001	Local testing target (testing profile only)

Usage

Hunt workflow

Import a program. New Hunt → paste a HackerOne URL, drop in a scope JSON, or enter scope manually. The orchestrator scrapes the program page (or reads the JSON), parses scope and rules, and produces a structured briefing.
Choose a strategy. The orchestrator presents 3–5 ranked attack strategies based on asset types, historical bounty data, and detected technologies. Pick one or type a custom instruction.
Watch the dispatch. The Agent Status panel shows live agent state. The chat displays findings, agent reasoning, and orchestrator updates as they happen.
Approve commands. Active-test commands surface in an ApproveDenyModal with full context. Approve, deny, modify, or pause.
Triage findings. Confirmed findings appear with severity badges, validation status, duplicate-check results, and quality scores. Drill into any finding for full evidence and HTTP exchanges.
Submit reports. For confirmed findings, the orchestrator generates a HackerOne-shape PoC report. Review in the Report Editor, edit, and submit through the integrated H1 API.

Operating modes

Standard mode — every active-test command requires explicit approval.
Auto-approve (per category) — passive recon auto-approved, active testing still gated.
Economy mode — Settings → Advanced → Hunt Behavior caps cost per finding.
Headless / CLI — for benchmark runs and unattended operations; see npm run benchmark:xbow and scripts/htb_runner.py.

Development

Build commands

npm run tauri dev                       # dev server with hot reload
npm run tauri build                     # production binary + installers
npm run lint                            # tsc + eslint + prettier
cd src-tauri && cargo clippy -- -D warnings
cd src-tauri && cargo fmt

Coding standards

Rust (src-tauri/src/)

thiserror for typed errors; anyhow only at binary entry points.
Exhaustive enum matching — no wildcard _ on enums that may grow.
Arc<Mutex<T>> for shared state with minimal lock duration.
tracing for structured logging with spans.
Every Tauri command validates input before processing.
Mutex poisoning is recovered via into_inner(), not .expect() (no process-wide panics from unrelated thread failures).

TypeScript (src/)

Strict mode; no implicit any.
Interfaces over type aliases for extensible object shapes.
async/await exclusively — no raw .then() chains.
Functional React with hooks only.
Every invoke() call has a typed command/response pair.

Command execution

argv arrays, never template-literal shell strings.
Null-byte-joined wire format: ['cmd', 'arg1'].join('\x00').
Every command passes through scope validation and the approval gate.

Testing

npm test                            # full TypeScript suite (Vitest)
npm run test:watch                  # watch mode
npm run test:coverage               # with coverage reporting
npm run test:live                   # integration tests (needs running services)
cd src-tauri && cargo test          # Rust suite

Coverage at HEAD (`6e8450e`)

Suite	Tests	Status
TypeScript	2,721 passing / 18 skipped / 0 failing	across 121 test files
Rust	143 passing / 0 failing / 1 ignored	unit + integration
`tsc --noEmit --skipLibCheck`	clean
`cargo clippy --all-targets -- -D warnings`	clean

Test categories

Category	What it covers	Config
Unit	Individual modules in isolation	`vitest.config.ts` (30 s timeout)
Integration	Multi-module + external service interactions	`vitest.integration.config.ts` (120 s timeout)
Agent fleet	Every agent initializes, dispatches, and reports correctly	`agent_fleet.test.ts`
Security	Scope-deny paths, kill-switch fail-safe, approval-pipeline rejection	Multiple files
Validators	Each of 66 validators with positive AND negative cases	`_validators.test.ts`, `v16_h1_validators.test.ts`, `wave7_b.test.ts`
Provider	API-key validation, streaming, fallback chains	`provider_fallback.test.ts`
CIDR / IDN	IPv4-mapped IPv6, homoglyph, edge prefixes (`/0`, `/32`, `/128`)	`safe_to_test.rs`
SimHash	64-bit entropy invariants, near-dup grouping	`finding_dedup.test.ts`
Compose / readiness	YAML regex strictness, HTTPS-aware probe parallelism	`v17_*.test.ts`

Project Structure

huntress/
├── src/                                # Frontend (React / TypeScript)
│   ├── agents/                         # 27 specialist agents + auth + recon
│   │   ├── oauth/                      # 4 OAuth sub-modules + discovery
│   │   ├── base_agent.ts               # Abstract base + finding types
│   │   ├── agent_catalog.ts            # Registry
│   │   ├── agent_router.ts             # Selection + dispatch
│   │   └── standardized_agents.ts      # Self-registration trigger
│   ├── components/                     # React UI surfaces
│   │   ├── ChatInterface.tsx           # Primary interaction
│   │   ├── ApproveDenyModal.tsx        # Human approval gate
│   │   ├── ReportReviewModal.tsx       # Submission gate
│   │   └── ...
│   ├── core/
│   │   ├── orchestrator/               # Coordinator engine, dispatch, dedup
│   │   ├── engine/                     # ReAct loop, tool schemas, chain summarizer
│   │   ├── providers/                  # ModelProvider abstractions
│   │   ├── reporting/                  # PoC generation, templates, H1 API
│   │   ├── validation/                 # 27 deterministic validators + OOB server
│   │   ├── http/                       # Request engine, scope check, rate control
│   │   ├── memory/                     # Qdrant integration, hunt history
│   │   ├── benchmark/                  # XBOW runner, persistence, scoring
│   │   ├── auth/                       # Session manager, token refresh
│   │   ├── discovery/                  # Crawler, JS analyzer, schema parser
│   │   └── ...
│   └── tests/                          # 120 test files
├── src-tauri/                          # Backend (Rust / Tauri 2.0)
│   └── src/
│       ├── lib.rs                      # 50+ Tauri commands, module integration
│       ├── safe_to_test.rs             # Scope validator (1,468 LOC, 49 tests)
│       ├── pty_manager.rs              # Secure subprocess execution
│       ├── kill_switch.rs              # Emergency shutdown with persistence
│       ├── proxy_pool.rs               # HTTP/HTTPS/SOCKS5 rotation
│       ├── secure_storage.rs           # OS keychain integration
│       ├── sandbox.rs                  # Container isolation
│       ├── agent_browser.rs            # Playwright Node subprocess manager
│       ├── h1_api.rs                   # HackerOne REST client
│       └── tool_checker.rs             # Security tool availability checks
├── scripts/                            # Automation
│   ├── setup.sh                        # Bootstrap
│   ├── install_security_tools.sh       # Tool installer
│   ├── htb_runner.py                   # HackTheBox training (experimental)
│   └── deploy_production.sh            # Gradual model deployment
├── docker-compose.yml                  # Qdrant + testing services
├── PIPELINE.md                         # Single source of truth for open work
└── README.md                           # This file

Codebase metrics (HEAD)

Metric	Value
TypeScript / TSX source files	330
TypeScript LOC (approx)	~89,000
Rust source files	11
Rust LOC	~7,300
Specialist hunting agents	27 (+ 2 utility: Auth Worker, Recon)
OAuth sub-modules	4
Validator registrations	51 (across 47 unique vulnerability types)
HackerOne report templates	30
React components	19
Tauri IPC commands	53 (across 11 Rust files)
TypeScript tests	2,721
Rust tests	143

Known Gaps

This section is intentionally explicit. Marketing-style claims that paper over real gaps make the platform less useful, not more.

Gap	What it means	Status
Zero triaged HackerOne submissions	We have not yet had a Huntress-generated report accepted, marked duplicate, marked informative, or marked N/A on a real H1 program. Every quality claim about reports is therefore unproven against live triage criteria.	Open milestone — first submission is the next major step.
Benchmark cohort coverage uneven	The v18 38/103 result is the first full-corpus completion, but XSS (4%) and SSRF (0/3) remain capability gaps. L3 challenges (1/8 = 13%) are barely sampled at this corpus size.	Targeted prompt + browser-flow work for XSS hunter; next benchmark batch.
Quality scorer not validated against H1	`report_quality.ts` produces letter grades, but those grades have never been compared to real H1 triage outcomes. The scorer reflects what we think a good H1 report looks like, not what triagers actually accept.	Will be calibrated once submissions accumulate.
Training pipeline is experimental	The Axolotl + LoRA local fine-tuning path requires a 24 GB+ GPU and has not been validated end-to-end. Behind the `EXPERIMENTAL_TRAINING` feature flag, excluded from the default test run.	Research preview only.
God-object refactors deferred	`react_loop.ts` (~2.5K LOC), `orchestrator_engine.ts` (~3.4K LOC), and `validator.ts` (~5.3K LOC) are large. Concrete pain hasn't appeared yet, so refactoring is deferred rather than premature.	Tracked but not blocking.

The single source of truth for open and closed work is PIPELINE.md.

Disclaimer

Huntress is for authorized security testing only. This includes:

Bug bounty programs with explicit, written authorization (HackerOne, Bugcrowd, Intigriti, etc.).
Penetration-testing engagements with signed scope agreements.
Security research on systems you own or have written permission to test.
Educational use in controlled environments (CTFs, deliberately vulnerable VMs, your own lab).

Users are solely responsible for ensuring they have proper authorization before testing any target. Unauthorized access to computer systems is illegal under the Computer Fraud and Abuse Act (CFAA) in the United States and equivalent legislation in other jurisdictions. The authors and contributors assume no liability for misuse.

The default-deny scope validator is the first line of defense. The human approval gate is the second. The kill switch is the third. None of these substitute for the operator confirming, in writing, that they have authorization for every target they configure.

License

MIT — see LICENSE for the full text.

Acknowledgments

Tauri — desktop application framework
Anthropic — Claude AI models powering the orchestrator + specialist fleet
Qdrant — vector database
Playwright — browser automation for validation + JS-rendered crawl
Project Discovery — interactsh, nuclei, subfinder, httpx
XBOW — the public Validation Benchmark used as our primary eval harness
HackerOne — bug bounty platform and API
HackTheBox — training environment for the experimental learning pipeline

Built and maintained by JBWolfFlow under NeuroForge Technologies.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github		.github
.vscode		.vscode
backups		backups
config		config
docker		docker
docs		docs
public		public
recordings		recordings
scripts		scripts
src-tauri		src-tauri
src		src
test-reports/phase5-validation		test-reports/phase5-validation
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
HACKERONE_API_SETUP.md		HACKERONE_API_SETUP.md
INSTALL_DEPS.txt		INSTALL_DEPS.txt
Justfile		Justfile
LICENSE		LICENSE
PIPELINE.md		PIPELINE.md
QUICK_START_HUNT.md		QUICK_START_HUNT.md
README.md		README.md
SETUP.md		SETUP.md
TOOL_SAFETY_QUICK_REFERENCE.md		TOOL_SAFETY_QUICK_REFERENCE.md
biome.json		biome.json
cliff.toml		cliff.toml
docker-compose.yml		docker-compose.yml
index.html		index.html
lefthook.yml		lefthook.yml
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
renovate.json		renovate.json
setup.sh		setup.sh
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.test.json		tsconfig.test.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts
vitest.integration.config.ts		vitest.integration.config.ts

Folders and files

Latest commit

History

Repository files navigation

Huntress

Current Status

What changed recently

Table of Contents

Overview

Design principles

Benchmark Results

XBOW Validation Benchmark (104 Docker CTF challenges)

Reproducing the benchmark

Architecture

The dispatch loop

Multi-model providers

Capabilities

Vulnerability hunting

Reporting and submission

Reconnaissance and discovery

Operational controls

Agent Fleet

Security Model

Scope validation engine

Human approval gate

Kill switch

Command execution

Secure credential storage

Sandbox isolation

Proxy rotation

Audit trail

Technology Stack

Backend (Rust 1.70+)

Frontend (TypeScript 5.8 strict)

AI and data

Installation

Prerequisites

Quick start

Production build

Optional: install the security tooling suite

Configuration

First-run setup

Environment variables

Scope file format

Docker services

Usage

Hunt workflow

Operating modes

Development

Build commands

Coding standards

Testing

Coverage at HEAD (6e8450e)

Test categories

Project Structure

Codebase metrics (HEAD)

Known Gaps

Disclaimer

License

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Coverage at HEAD (`6e8450e`)

Packages