Skip to content

JBWolfFlow/Huntress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

156 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Huntress

AI-driven bug bounty automation for HackerOne — research preview

Rust TypeScript Tauri React License: MIT XBOW


Huntress is a Tauri 2.0 desktop application that automates HackerOne bug bounty hunting through a coordinated fleet of AI agents. A primary orchestrator (Claude Opus 4.6) ingests a program's scope, plans an attack strategy, and dispatches 27 specialized vulnerability-hunting agents — each running its own ReAct loop on a cost-tiered model (Haiku or Sonnet). All target interactions pass through a Rust scope validator, a default-deny security model, and a human approval gate.

This is a research preview, not a production tool. The platform infrastructure is solid and well-tested, but no Huntress-generated report has been triaged on a real HackerOne program yet — that is the next milestone, not a past achievement. Use accordingly.


Current Status

Indicator Value Source
XBOW benchmark 38 / 103 challenges solved (36.89%) v18 second published run, 2026-05-09 (commit 6e8450e) — 52% L1, 27% L2, 13% L3; $58.05; 5h33m; Sonnet 4.6
HackerOne triaged submissions 0 First submission is an open milestone
TypeScript tests 3,857 passing / 18 skipped / 0 failing npx vitest run (169 files)
Rust tests 159 passing / 0 failing / 1 ignored cd src-tauri && cargo test
Type / lint tsc --noEmit clean · cargo clippy --all-targets -- -D warnings clean CI
Open audit findings 0 All 9 BLOCKERs, 7 HIGH-severity, and 8 MED-severity items closed across v15–v17
Hunt history 12 sessions (6 OWASP Juice Shop training + 6 real-world H1 programs) Internal logs

The XBOW number above is the v18 second published run — the first full 103-challenge completion (v13.3 aborted at 55/103 by credit exhaustion). L2 jumped from 2% → 27% and L3 from 0% → 13% versus v13.3. XSS remains the largest capability gap at 4% solve rate; see PIPELINE.md §13 for the full per-category breakdown.

What changed recently

  • Wave 7 (2026-05-18 — 2026-05-19)Anthropic CVP integration: official Cyber Verification Program approval received 2026-05-13 for org 64d13134-7083-4aca-a808-fb16a2d6b9d0 (dual-use cybersecurity activities, with mass-exfil + ransomware as enforced carve-outs). 9 commits shipped: A7 CVP-aware authorization preamble + A1a centralized payload library (14 carve-out safety gates) + A1b orchestrator wires cvpStatus end-to-end + A1c-A1e per-agent payload-library migrations across 8 hunter agents (xss / sqli / ssrf / ssti / cmd_injection / proto_pollution / deserialization / path_traversal) + B1+B2 SSO realm 3-state oracle + cross-realm token validators + B3+B4+B7+B8+B9 deterministic validator bundle (uuidv1_leak / cors_method_misadvertised / auth_bypass_headers / http_verb_confusion / admin_debug_paths — engagement-derived from Quixel SSO + KWS findings) + C5 adaptive agent dispatch driven by Phase 0.5 disclosed-report dedup analysis (avoid-list agents dropped, underserved-list agents priority-boosted). Set HUNTRESS_CVP_GRANTED=1 + HUNTRESS_CVP_ORG_ID=<approved-org-id> in env to activate aggressive payloads across all 8 migrated agents. Default behavior unchanged when env unset (back-compat).
  • Wave 6 (2026-05-17 — 2026-05-18) — engagement-derived items (10 commits): program_selector saturation/asset-type scoring; Phase 0.5 disclosed-report dedup analysis library; orchestrator EV checkpoint wiring; Salesforce Aura RPC validator; auth_param_validation validator; SPA bundle endpoint extraction; acquisition / shared-infra detector; Subzy false-positive filter; WAF bypass budget cap (20 min per target); mobile APK endpoint extractor (apktool + strings + scoped grep + redacted-credentials).
  • v17 (May 2026) — closed every remaining MED audit finding: IPv4-mapped-IPv6 CIDR bypass, IDN homoglyph regression guards, a real 32-bit aliasing bug in the duplicate-checker SimHash, severity-predictor calibration staleness warnings, HTTPS-aware readiness probe (http:// and https:// probed in parallel), condition: service_healthy regex hardening with lookahead anchor, and 13 new Rust tests for sandbox + agent_browser invariants.
  • v16 (May 2026) — closed all remaining HIGH audit findings: 9 deterministic validators for previously pass-through vulnerability types (CSRF, OAuth redirect_uri, info disclosure, rate-limit bypass, CRLF, blind boolean SQLi, MFA bypass, SAML, WebSocket); 20 new HackerOne-shape report templates; chain-summarizer evidence preservation (FLAG, secrets, endpoints survive context pruning); open-redirect validator hardening (javascript:-scheme + protocol-relative //host detection).
  • v15 (May 2026) — closed all 9 BLOCKERs from a comprehensive program audit: dynamic Docker compose service discovery, expanded body-snippet capture (15 KB), production submission pathway wired to the report-template system, broken-access validator now requires differential proof.

A single source of truth for all open and closed work lives in PIPELINE.md. A research-derived backlog distilled from the team's accumulated bug-bounty methodology corpus is summarized at PIPELINE.md §3.6 (P1-6) and tracked in detail in the project's Obsidian vault notes (developer-local).


Table of Contents


Overview

Huntress operates on a coordinator-solver architecture:

  • The OrchestratorEngine (Claude Opus 4.6 by default) ingests a HackerOne bounty program's scope, rules, asset list, and bounty table; produces a ranked attack plan; and delegates work to specialist agents via a fire-and-forget dispatch pattern (5 agents in flight, the rest queued).
  • Specialist agents run a ReAct loop (Reason + Act) on a tier-appropriate model — Haiku for cheap, deterministic checks (recon, CORS headers, cache, subdomain takeover); Sonnet for harder reasoning (SQLi, XSS, SSRF, IDOR, OAuth, JWT). Tier assignments are locked for security-sensitive agents and cannot be raised by keyword inflation.
  • A Rust scope validator (safe_to_test.rs, 1,468 LOC) is the single chokepoint for "is this target legal to touch" decisions. Default-deny. Wildcard, CIDR, IPv6, IP-range, and port-list support. 49 dedicated tests.
  • A human approval gate sits in front of every state-changing command, with per-category auto-approve rules available for users who want to run without per-command intervention.
  • A kill switch (kill_switch.rs) provides single-keystroke shutdown, persists across crashes, fail-safes to ACTIVE on corrupted state, and tears down all sandbox containers on activation.

The platform supports any combination of providers — Anthropic, OpenAI, Google, OpenRouter, and local Ollama — through a common ModelProvider interface. The default and recommended configuration uses Anthropic Claude exclusively.

Design principles

Principle Implementation
User control at every step Approval gate sits in front of every active-test command; rejection unwinds the agent's plan.
Default-deny security posture Out-of-scope targets, unvalidated commands, and unhealthy proxies are blocked rather than passed through.
Multi-model by design Orchestrator and per-agent models swap with zero code changes; provider failure triggers fallback.
Cost-aware Tiered routing keeps simple agents on Haiku; budget enforcement at 90% (warn) and 100% (hard stop).
Desktop-native Ships as a single double-click app on Linux, macOS, and Windows via Tauri 2.0 — not a CLI.

Benchmark Results

XBOW Validation Benchmark (104 Docker CTF challenges)

The XBOW Validation Benchmark is a public set of 104 dockerized vulnerability challenges with known flags. It is the closest thing the AI-bug-bounty space has to a standardized eval.

Run Score Date Commit Notes
v18 38 / 103 (36.89%) 2026-05-09 6e8450e Second published run, first full 103-challenge completion (5h33m, $58.05, Sonnet 4.6). 52% L1 / 27% L2 / 13% L3.
v13.3 22 / 103 (21.4%) 2026-05-07 f479f56 First published full-corpus run; aborted at 55/103 by credit exhaustion.

What the 21.4% means: for each of 103 challenges, a Sonnet-tier specialist agent (selected by attack-tag heuristics) was given 40 ReAct iterations and ~10 minutes of wall-clock time to extract a FLAG{...} token from a freshly-built Docker container. Solved = the agent's reported flag matched the build-arg-injected UUID, OR (when our build-arg injection couldn't be verified) any well-formed FLAG{...} the agent extracted from the running container.

What it does not measure: real-world HackerOne triage outcomes. XBOW challenges are CTFs with known winning paths; HackerOne programs are open-ended targets where finding a valid bug is half the work and writing a triagable report is the other half. A high XBOW score is necessary but not sufficient for first-submission success.

The full per-challenge breakdown, agent dispatch traces, and SQLite-persisted run history are stored under ~/.local/share/huntress/benchmarks/xbow/ after a run completes.

Reproducing the benchmark

# Clone the XBOW benchmark suite (one-time, ~2 GB)
git clone https://github.com/xbow-engineering/validation-benchmarks ~/xbow-benchmark

# Run the full 104-challenge sweep (cost: ~$70-100 in Anthropic API spend, ~6h wall clock)
npm run benchmark:xbow -- --benchmark-dir ~/xbow-benchmark --model claude-sonnet-4-6

Per-challenge timeout, parallelism, and iteration cap are configurable via BenchmarkConfig. Each attempt streams JSONL traces to ~/.local/share/huntress/benchmarks/xbow/<run-id>/<challenge>.jsonl for post-hoc inspection.


Architecture

+-----------------------------------------------------------------------+
|                       Huntress Desktop Application                    |
+-----------------------------------------------------------------------+
|                                                                       |
|   +---------------------------------------------------------------+   |
|   |                Frontend (React 19 / TypeScript)               |   |
|   |   Chat | Agent Status | Findings | Reports | Settings | PTY  |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|                            Tauri IPC Bridge                           |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                  Backend (Rust / Tauri 2.0)                   |   |
|   |   Scope Validator | PTY Manager | Kill Switch | Sandbox Mgr  |   |
|   |   Secure Storage  | Proxy Pool  | H1 API      | Agent Browser |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |             AI Orchestration Layer (TypeScript)               |   |
|   |   OrchestratorEngine | AgentRouter | ReAct Loop | Cost Router |   |
|   |   Adviser | Refiner  | Toolcall_fixer | Chain Summarizer     |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                    Specialist Agent Fleet (27)                |   |
|   |   XSS · SQLi · SSRF · IDOR · OAuth · JWT · SSTI · XXE · ...  |   |
|   +-------------------------------+-------------------------------+   |
|                                   |                                   |
|   +-------------------------------v-------------------------------+   |
|   |                          Data Layer                           |   |
|   |   Qdrant (vectors) | SQLite (knowledge graph + benchmark)     |   |
|   |   OS Keychain (secrets) | Asciinema (audit trail)             |   |
|   +---------------------------------------------------------------+   |
|                                                                       |
+-----------------------------------------------------------------------+

The dispatch loop

  1. The orchestrator builds an ordered task list from the program's scope and attack-surface analysis.
  2. dispatchAgent() fires up to 5 agents in parallel and returns immediately — finding deduplication, validation, and H1 duplicate-check all run fire-and-forget.
  3. Each agent's ReAct loop iterates: think → call a tool → observe → update plan, with a hard cap (40 iterations by default) and a chain summarizer that pages out old context while preserving high-value evidence (flags, secrets, discovered endpoints).
  4. Findings flow through 27 deterministic validators (active probes, not LLM grading) before the user ever sees them. Confidence scores are adjusted based on real server evidence — there is no auto-confirm-on-agent-claim shortcut.

Multi-model providers

ModelProvider (interface)
├── AnthropicProvider   (Claude Opus 4.x, Sonnet 4.x, Haiku 4.x)
├── OpenAIProvider      (GPT-4o, GPT-4o-mini, o3)
├── GoogleProvider      (Gemini 2.5 Pro, Gemini Flash)
├── OpenRouterProvider  (any model exposed via OpenRouter)
└── LocalProvider       (Ollama — Llama, Mistral, Qwen, etc.)

The default configuration uses Anthropic Claude exclusively because the team only validates against Anthropic models. Switching providers requires only a Settings change; no code edit.


Capabilities

Vulnerability hunting

  • 27 specialist agents (+ 2 utility agents: Auth Worker and Recon) covering OWASP Top 10 plus emerging classes (prototype pollution, prompt injection, SAML signature wrapping, HTTP smuggling, race conditions).
  • 66 validator registrations across 62 unique vulnerability types in validator.ts — most are deterministic active probes (XSS uses Playwright dialog detection, SQLi re-executes payloads with timing diff, SSRF + XXE + command injection use OOB callbacks via interactsh, CSRF validates by replaying with attacker Origin, 4 OAuth deterministic types, cache poisoning 3-step proof, JWT none + alg-confusion forgery, 20-concurrent race condition with distribution analysis, Wave 7 added SSO realm 3-state oracle + cross-realm token confused-deputy + UUIDv1 leak + CORS method-set misadvertisement + auth-bypass-headers + HTTP-verb-confusion + admin-debug-paths). A small set is intentionally pass-through where deterministic verification is provably impossible (subdomain_takeover heuristic-only, csrf and the stateful blind-boolean family require custom scaffolding).
  • Active probing only: every confirmed finding has hard server evidence (a triggered alert, a leaked secret, a successful injection with a 2xx); the agent's confidence alone never produces confirmed=true.
  • Cross-agent knowledge sharing via a Blackboard pattern — the IDOR hunter learns about the JWT structure the auth-worker discovered without re-fetching it.
  • WAF awareness — agents receive vendor-specific bypass strategies via injected WafContext.

Reporting and submission

  • HackerOne-shape report templates for 30 vulnerability types covering Summary, Vulnerability Details, Prerequisites, Steps to Reproduce, HTTP Evidence, Expected vs Actual, Proof of Concept, Impact, Affected Scope, and Remediation.
  • Real CVSS 3.1 calculator wired into report generation; vector strings included.
  • Duplicate detection against H1 hacktivity (verified live), GitHub Security Advisories, and the local Qdrant memory; uses 64-bit FNV-1a SimHash with proper bigint shifts (after a v17 fix to a previous 32-bit-aliasing bug).
  • Severity prediction with calibration-staleness warnings — the predictor flags when its 2025 industry-average baseline has aged past 365 days and historical data is sparse.
  • HackerOne API integration with attachment upload, draft preview, and a ReportReviewModal submission gate that blocks duplicate-skip, F-grade quality scores, missing descriptions, or insufficient reproduction steps.

Reconnaissance and discovery

  • JS-rendered crawler (opt-in via useHeadlessBrowser: true on CrawlConfig) that drives a real browser via Playwright; discovers SPA endpoints HTTP-only crawlers miss (Angular / React / Vue). Default is HTTP-only for back-compat; flip the flag when targeting JS-heavy applications.
  • OpenAPI / Swagger / GraphQL schema parsing → endpoint catalog → task generation.
  • Subdomain enumeration, technology fingerprinting, parameter mining, and Nuclei template integration for known-vulnerability scanning.

Operational controls

  • Kill switch — atomic, persistent, fail-safe to ACTIVE; tears down sandbox containers on activation.
  • Approval gate with 60-second timeout, audit trail logging, and per-category auto-approve rules.
  • Adaptive rate controller per-domain, with WAF detection.
  • Stealth headers — 19 user-agent profiles, randomized.
  • Proxy rotation with health checking; failed proxies evicted automatically; poisoned-mutex recovery in the global pool.

Agent Fleet

Each agent self-registers via registerAgent() at module import time. Registration is centralized in src/agents/standardized_agents.ts.

Agent Vulnerability class Tier
Recon Agent Subdomain enumeration, tech fingerprinting, endpoint discovery Haiku
Auth Worker Agent Login flow detection, session acquisition, token refresh Sonnet
OAuth (4 sub-modules) redirect_uri override, missing state, PKCE downgrade, scope escalation Sonnet
SSRF Hunter Server-side request forgery, internal service access Sonnet
XSS Hunter Reflected, stored, and DOM-based cross-site scripting Sonnet
SQLi Hunter Error-based, blind boolean, blind time-based across multiple engines Sonnet
NoSQL Hunter NoSQL injection (MongoDB, CouchDB, etc.) Sonnet
GraphQL Hunter Introspection, batching, nested query depth Haiku
IDOR Hunter Insecure direct object references, BOLA, two-account access proof Sonnet
SSTI Hunter Server-side template injection (Jinja2, Freemarker, Velocity, Pug) Sonnet
Command Injection Hunter OS command execution via user input, OOB callback verification Sonnet
Path Traversal Hunter Directory traversal and local file inclusion Sonnet
CORS Hunter Origin reflection, null origin, subdomain wildcard, credential inclusion Haiku
Host Header Hunter Host header injection, cache poisoning, password-reset link manipulation Haiku
Open Redirect Hunter URL redirect chains, javascript:-scheme detection Haiku
Prototype Pollution Hunter JavaScript prototype chain manipulation, gadget chains Sonnet
CRLF Hunter HTTP header injection via carriage-return / line-feed Haiku
HTTP Smuggling Hunter Request smuggling (CL.TE, TE.CL, TE.TE) Sonnet
XXE Hunter XML external entity injection (direct + blind via OOB) Sonnet
JWT Hunter Algorithm confusion, alg=none, kid injection, jwk smuggling Sonnet
SAML Hunter Signature wrapping, comment injection, unsigned-assertion replay Sonnet
WebSocket Hunter Cross-Site WebSocket Hijacking (CSWSH), origin enforcement Haiku
Race Condition Hunter TOCTOU, double-spend, parallel-request races (20 concurrent probes) Sonnet
Deserialization Hunter Java serializable, Python pickle, PHP unserialize, .NET BinaryFormatter Sonnet
Cache Hunter Web cache poisoning, cache deception Sonnet
Subdomain Takeover Hunter Dangling DNS records, unclaimed cloud resources Haiku
MFA Bypass Hunter OTP / TOTP / WebAuthn bypass paths Sonnet
Business Logic Hunter Negative-quantity, zero-cost, currency manipulation patterns Sonnet
Prompt Injection Hunter LLM prompt injection in AI-powered features Sonnet

Tier assignments are enforced by COMPLEXITY_LOCKED_AGENTS in cost_router.ts for security-sensitive types — keyword-based "this looks complex, upgrade to Opus" inflation is blocked for them.


Security Model

Defense in depth, with multiple independent enforcement layers:

Scope validation engine

safe_to_test.rs (1,468 LOC) parses HackerOne JSON scope and enforces a strict default-deny policy. Supports wildcard domains, CIDR notation (IPv4 + IPv6), IP ranges, port lists, and per-host port restrictions. Recent hardening: IPv4-mapped IPv6 normalization (::ffff:a.b.c.d no longer bypasses an IPv4 CIDR), full IDN punycode canonicalization, homoglyph differentiation. 41 dedicated tests with positive AND negative cases.

Human approval gate

Before any state-changing command executes against a live target, an ApproveDenyModal presents the exact command, the requesting agent, the target, the safety category, and any warnings. The user can approve, deny, modify, or pause. Per-category auto-approve rules are available behind an explicit opt-in confirmation dialog. The approval promise has a 60-second timeout backed by an audit-trail log.

Kill switch

kill_switch.rs uses atomic state + file persistence with fsync. Activation broadcasts to all subscribers and calls Sandbox::destroy_all(). The fail-safe on a corrupted state file defaults to ACTIVE — the safest possible state.

Command execution

pty_manager.rs uses CommandBuilder with explicit argv arrays — never shell string interpolation. Validates against dangerous characters (|, &, ;, $) and sanitizes environment variables. The training command path additionally locks Python invocations to scripts under allowed directories.

Secure credential storage

secure_storage.rs uses AES-256-GCM with HKDF key derivation and per-encryption random nonces. SettingsContext strips apiKeys before any localStorage.setItem(). Credentials are never written to disk in plaintext, never logged, and never included in error messages.

Sandbox isolation

sandbox.rs creates Docker / Podman containers with ReadonlyRootfs, all capabilities dropped (only NET_RAW added), no-new-privileges, non-root user, 2 GB memory cap, 1 CPU, 256 PIDs. Every blocked-prefix and blocked-exact env var is explicitly tested. Tinyproxy enforces scope on shell tools (curl, wget, git, pip, etc.).

Proxy rotation

proxy_pool.rs supports HTTP, HTTPS, and SOCKS5 with round-robin / random / least-recently-used / fastest-first strategies. Continuous health checking. Failed proxies evicted from the active pool. Poisoned-mutex recovery instead of process-wide panic.

Audit trail

All command executions are recorded in asciinema format. Decision logs, agent reasoning traces, and tool invocations are captured for post-session review and (opt-in) training-data collection.


Technology Stack

Backend (Rust 1.70+)

Component Crate Purpose
Desktop runtime tauri 2.0 Native packaging, IPC, system integration
Async runtime tokio Concurrent I/O
Cryptography ring Key derivation, AES-GCM
HTTP client reqwest Outbound requests with proxy support
Container API bollard Docker / Podman sandbox management
Subprocess PTY portable-pty Isolated execution with recording
Embedded SQL rusqlite Knowledge graph + benchmark persistence
Errors thiserror Typed hierarchies (anyhow only at binary entry)
Logging tracing + tracing-subscriber Structured logs with spans

Frontend (TypeScript 5.8 strict)

Component Library Purpose
UI framework React 19 Component architecture
Build tool Vite 7 Dev server + production bundle
Styling Tailwind CSS 4 Utility-first dark theme
Terminal xterm.js (@xterm/xterm + addon-fit + addon-web-links) Embedded PTY view — full ANSI rendering for tool output (nmap, sqlmap, nuclei, ffuf, dalfox, gobuster), clickable URLs, fit-to-panel resize
Charts Recharts Benchmark + cost dashboards
Virtual scrolling react-virtuoso Large finding lists
Markdown react-markdown Report rendering
Testing Vitest + Testing Library Unit + integration

AI and data

Component Technology Purpose
Orchestrator Anthropic Claude Opus 4.6 Coordination, planning, synthesis
Specialist agents Claude Sonnet 4.6 / Haiku 4.5 Per-tier agent reasoning
Vector database Qdrant Semantic dedup, technique recall
Browser automation Playwright XSS dialog detection, JS-rendered crawl
OOB infrastructure interactsh + Burp Collaborator + DNS canary Blind SSRF / XXE / RCE confirmation
H1 integration HackerOne REST API Program import, report submission

Installation

Prerequisites

Requirement Minimum Notes
OS Linux (Kali tested) macOS and Windows supported via Tauri
Node.js 18+ Frontend build
Rust 1.70+ stable Backend compilation
Docker 20+ Qdrant + sandbox containers
Python 3.10+ Optional — only for the experimental training pipeline
NVIDIA GPU (24 GB+ VRAM) Optional — only for local LoRA fine-tuning

Quick start

git clone https://github.com/JBWolfFlow/Huntress.git
cd Huntress

# One-time bootstrap (installs system tooling, sets up directories)
chmod +x scripts/setup.sh
./scripts/setup.sh

# Node dependencies
npm install

# Start Qdrant in the background
docker compose up -d

# Launch the development build
npm run tauri dev

Production build

npm run tauri build
# Binary:    src-tauri/target/release/
# Installers: src-tauri/target/release/bundle/{deb,AppImage,dmg,msi}/

Optional: install the security tooling suite

chmod +x scripts/install_security_tools.sh
./scripts/install_security_tools.sh

This installs nmap, sqlmap, gobuster, ffuf, nuclei, subfinder, httpx, dnsx, wafw00f, and the rest of the agent toolkit.


Configuration

First-run setup

The setup wizard collects:

  1. AI provider — pick from Anthropic / OpenAI / Google / OpenRouter / Local (Ollama).
  2. API key — written to the OS keychain via Tauri's secure-storage abstraction; never to disk in plaintext.
  3. Per-agent model overrides (optional) — defaults are cost-tier-optimized.
  4. HackerOne API token (optional) — required only for direct report submission.

All settings persist between sessions and can be edited from the Settings panel.

Environment variables

# AI providers (at least one required)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_AI_API_KEY=...
OPENROUTER_API_KEY=...

# HackerOne (optional)
HACKERONE_API_TOKEN=...

# Vector DB
QDRANT_URL=http://localhost:6333

# Experimental training pipeline (off by default)
EXPERIMENTAL_TRAINING=1
HTB_API_TOKEN=...
HUGGINGFACE_TOKEN=...

Scope file format

Standard HackerOne JSON:

{
  "targets": {
    "in_scope": [
      { "asset_identifier": "*.example.com", "asset_type": "URL", "eligible_for_bounty": true },
      { "asset_identifier": "192.0.2.0/24",  "asset_type": "CIDR", "eligible_for_bounty": true }
    ],
    "out_of_scope": [
      { "asset_identifier": "admin.example.com", "asset_type": "URL" }
    ]
  }
}

Wildcards, CIDR (IPv4 + IPv6), IP ranges, port specifications, and IDN domains are fully supported. The scope validator normalizes IPv4-mapped IPv6 (::ffff:a.b.c.d) and IDN/punycode forms so neither can be used as a bypass.

Docker services

# Core services
docker compose up -d

# Add the OWASP Juice Shop testing target
docker compose --profile testing up -d
Service Port Purpose
Qdrant (REST) 6333 Vector database
Qdrant (gRPC) 6334 High-performance vector interface
Juice Shop 3001 Local testing target (testing profile only)

Usage

Hunt workflow

  1. Import a program. New Hunt → paste a HackerOne URL, drop in a scope JSON, or enter scope manually. The orchestrator scrapes the program page (or reads the JSON), parses scope and rules, and produces a structured briefing.
  2. Choose a strategy. The orchestrator presents 3–5 ranked attack strategies based on asset types, historical bounty data, and detected technologies. Pick one or type a custom instruction.
  3. Watch the dispatch. The Agent Status panel shows live agent state. The chat displays findings, agent reasoning, and orchestrator updates as they happen.
  4. Approve commands. Active-test commands surface in an ApproveDenyModal with full context. Approve, deny, modify, or pause.
  5. Triage findings. Confirmed findings appear with severity badges, validation status, duplicate-check results, and quality scores. Drill into any finding for full evidence and HTTP exchanges.
  6. Submit reports. For confirmed findings, the orchestrator generates a HackerOne-shape PoC report. Review in the Report Editor, edit, and submit through the integrated H1 API.

Operating modes

  • Standard mode — every active-test command requires explicit approval.
  • Auto-approve (per category) — passive recon auto-approved, active testing still gated.
  • Economy mode — Settings → Advanced → Hunt Behavior caps cost per finding.
  • Headless / CLI — for benchmark runs and unattended operations; see npm run benchmark:xbow and scripts/htb_runner.py.

Development

Build commands

npm run tauri dev                       # dev server with hot reload
npm run tauri build                     # production binary + installers
npm run lint                            # tsc + eslint + prettier
cd src-tauri && cargo clippy -- -D warnings
cd src-tauri && cargo fmt

Coding standards

Rust (src-tauri/src/)

  • thiserror for typed errors; anyhow only at binary entry points.
  • Exhaustive enum matching — no wildcard _ on enums that may grow.
  • Arc<Mutex<T>> for shared state with minimal lock duration.
  • tracing for structured logging with spans.
  • Every Tauri command validates input before processing.
  • Mutex poisoning is recovered via into_inner(), not .expect() (no process-wide panics from unrelated thread failures).

TypeScript (src/)

  • Strict mode; no implicit any.
  • Interfaces over type aliases for extensible object shapes.
  • async/await exclusively — no raw .then() chains.
  • Functional React with hooks only.
  • Every invoke() call has a typed command/response pair.

Command execution

  • argv arrays, never template-literal shell strings.
  • Null-byte-joined wire format: ['cmd', 'arg1'].join('\x00').
  • Every command passes through scope validation and the approval gate.

Testing

npm test                            # full TypeScript suite (Vitest)
npm run test:watch                  # watch mode
npm run test:coverage               # with coverage reporting
npm run test:live                   # integration tests (needs running services)
cd src-tauri && cargo test          # Rust suite

Coverage at HEAD (6e8450e)

Suite Tests Status
TypeScript 2,721 passing / 18 skipped / 0 failing across 121 test files
Rust 143 passing / 0 failing / 1 ignored unit + integration
tsc --noEmit --skipLibCheck clean
cargo clippy --all-targets -- -D warnings clean

Test categories

Category What it covers Config
Unit Individual modules in isolation vitest.config.ts (30 s timeout)
Integration Multi-module + external service interactions vitest.integration.config.ts (120 s timeout)
Agent fleet Every agent initializes, dispatches, and reports correctly agent_fleet.test.ts
Security Scope-deny paths, kill-switch fail-safe, approval-pipeline rejection Multiple files
Validators Each of 66 validators with positive AND negative cases *_validators.test.ts, v16_h1_validators.test.ts, wave7_b*.test.ts
Provider API-key validation, streaming, fallback chains provider_fallback.test.ts
CIDR / IDN IPv4-mapped IPv6, homoglyph, edge prefixes (/0, /32, /128) safe_to_test.rs
SimHash 64-bit entropy invariants, near-dup grouping finding_dedup.test.ts
Compose / readiness YAML regex strictness, HTTPS-aware probe parallelism v17_*.test.ts

Project Structure

huntress/
├── src/                                # Frontend (React / TypeScript)
│   ├── agents/                         # 27 specialist agents + auth + recon
│   │   ├── oauth/                      # 4 OAuth sub-modules + discovery
│   │   ├── base_agent.ts               # Abstract base + finding types
│   │   ├── agent_catalog.ts            # Registry
│   │   ├── agent_router.ts             # Selection + dispatch
│   │   └── standardized_agents.ts      # Self-registration trigger
│   ├── components/                     # React UI surfaces
│   │   ├── ChatInterface.tsx           # Primary interaction
│   │   ├── ApproveDenyModal.tsx        # Human approval gate
│   │   ├── ReportReviewModal.tsx       # Submission gate
│   │   └── ...
│   ├── core/
│   │   ├── orchestrator/               # Coordinator engine, dispatch, dedup
│   │   ├── engine/                     # ReAct loop, tool schemas, chain summarizer
│   │   ├── providers/                  # ModelProvider abstractions
│   │   ├── reporting/                  # PoC generation, templates, H1 API
│   │   ├── validation/                 # 27 deterministic validators + OOB server
│   │   ├── http/                       # Request engine, scope check, rate control
│   │   ├── memory/                     # Qdrant integration, hunt history
│   │   ├── benchmark/                  # XBOW runner, persistence, scoring
│   │   ├── auth/                       # Session manager, token refresh
│   │   ├── discovery/                  # Crawler, JS analyzer, schema parser
│   │   └── ...
│   └── tests/                          # 120 test files
├── src-tauri/                          # Backend (Rust / Tauri 2.0)
│   └── src/
│       ├── lib.rs                      # 50+ Tauri commands, module integration
│       ├── safe_to_test.rs             # Scope validator (1,468 LOC, 49 tests)
│       ├── pty_manager.rs              # Secure subprocess execution
│       ├── kill_switch.rs              # Emergency shutdown with persistence
│       ├── proxy_pool.rs               # HTTP/HTTPS/SOCKS5 rotation
│       ├── secure_storage.rs           # OS keychain integration
│       ├── sandbox.rs                  # Container isolation
│       ├── agent_browser.rs            # Playwright Node subprocess manager
│       ├── h1_api.rs                   # HackerOne REST client
│       └── tool_checker.rs             # Security tool availability checks
├── scripts/                            # Automation
│   ├── setup.sh                        # Bootstrap
│   ├── install_security_tools.sh       # Tool installer
│   ├── htb_runner.py                   # HackTheBox training (experimental)
│   └── deploy_production.sh            # Gradual model deployment
├── docker-compose.yml                  # Qdrant + testing services
├── PIPELINE.md                         # Single source of truth for open work
└── README.md                           # This file

Codebase metrics (HEAD)

Metric Value
TypeScript / TSX source files 330
TypeScript LOC (approx) ~89,000
Rust source files 11
Rust LOC ~7,300
Specialist hunting agents 27 (+ 2 utility: Auth Worker, Recon)
OAuth sub-modules 4
Validator registrations 51 (across 47 unique vulnerability types)
HackerOne report templates 30
React components 19
Tauri IPC commands 53 (across 11 Rust files)
TypeScript tests 2,721
Rust tests 143

Known Gaps

This section is intentionally explicit. Marketing-style claims that paper over real gaps make the platform less useful, not more.

Gap What it means Status
Zero triaged HackerOne submissions We have not yet had a Huntress-generated report accepted, marked duplicate, marked informative, or marked N/A on a real H1 program. Every quality claim about reports is therefore unproven against live triage criteria. Open milestone — first submission is the next major step.
Benchmark cohort coverage uneven The v18 38/103 result is the first full-corpus completion, but XSS (4%) and SSRF (0/3) remain capability gaps. L3 challenges (1/8 = 13%) are barely sampled at this corpus size. Targeted prompt + browser-flow work for XSS hunter; next benchmark batch.
Quality scorer not validated against H1 report_quality.ts produces letter grades, but those grades have never been compared to real H1 triage outcomes. The scorer reflects what we think a good H1 report looks like, not what triagers actually accept. Will be calibrated once submissions accumulate.
Training pipeline is experimental The Axolotl + LoRA local fine-tuning path requires a 24 GB+ GPU and has not been validated end-to-end. Behind the EXPERIMENTAL_TRAINING feature flag, excluded from the default test run. Research preview only.
God-object refactors deferred react_loop.ts (~2.5K LOC), orchestrator_engine.ts (~3.4K LOC), and validator.ts (~5.3K LOC) are large. Concrete pain hasn't appeared yet, so refactoring is deferred rather than premature. Tracked but not blocking.

The single source of truth for open and closed work is PIPELINE.md.


Disclaimer

Huntress is for authorized security testing only. This includes:

  • Bug bounty programs with explicit, written authorization (HackerOne, Bugcrowd, Intigriti, etc.).
  • Penetration-testing engagements with signed scope agreements.
  • Security research on systems you own or have written permission to test.
  • Educational use in controlled environments (CTFs, deliberately vulnerable VMs, your own lab).

Users are solely responsible for ensuring they have proper authorization before testing any target. Unauthorized access to computer systems is illegal under the Computer Fraud and Abuse Act (CFAA) in the United States and equivalent legislation in other jurisdictions. The authors and contributors assume no liability for misuse.

The default-deny scope validator is the first line of defense. The human approval gate is the second. The kill switch is the third. None of these substitute for the operator confirming, in writing, that they have authorization for every target they configure.


License

MIT — see LICENSE for the full text.


Acknowledgments

  • Tauri — desktop application framework
  • Anthropic — Claude AI models powering the orchestrator + specialist fleet
  • Qdrant — vector database
  • Playwright — browser automation for validation + JS-rendered crawl
  • Project Discoveryinteractsh, nuclei, subfinder, httpx
  • XBOW — the public Validation Benchmark used as our primary eval harness
  • HackerOne — bug bounty platform and API
  • HackTheBox — training environment for the experimental learning pipeline

Built and maintained by JBWolfFlow under NeuroForge Technologies.

About

AI-orchestrated bug bounty platform. 29 specialized hunters (XSS/SQLi/IDOR/SSRF/OAuth/…) coordinated by Claude Opus via ReAct loops. Phase 1 auth pipeline: sandbox env-var injection, multi-identity IDOR, two-probe bearer validation. 2,029 TS + 108 Rust tests. Tauri 2.0 + React + Rust. Scope validation, budget enforcement, human approval gates.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors