Contributions are welcome. The project is actively maintained, and there's a clear roadmap. If you're looking to contribute, here's what helps most:
- Bug reports with real steps to reproduce
- Bug fixes (especially around the API or scraper)
- New enrichment sources (more threat intel feeds)
Before starting work on these, open an issue first to avoid wasted effort:
- New pipeline stages (the core pipeline is stable for a reason)
- UI redesigns (the Next.js frontend has an evolving design system)
We don't accept contributions that would make VoidAccess useful for offensive operations, or that weaken the content filter. This tool is for defenders and investigators.
git clone https://github.com/YOUR_USERNAME/voidaccess.git
cd voidaccess
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\Activate # Windows
pip install -r requirements.txt
pip install -r dev-requirements.txt
python -m spacy download en_core_web_smCopy .env.example to .env. Only JWT_SECRET is required:
cp .env.example .env
# Edit .env and set JWT_SECRET
# Generate one with: python -c "import secrets; print(secrets.token_hex(32))"Optional keys enable more features:
OPENAI_API_KEY,ANTHROPIC_API_KEY,OPENROUTER_API_KEY— LLM providersOTX_API_KEY,VT_API_KEY,ABUSECH_API_KEY— threat intel feedsTOR_PROXY_HOST=127.0.0.1,TOR_PROXY_PORT=9050— local Tor (required for dark web search)REDIS_URL— for rate limiting (optional)PLAYWRIGHT_ENABLED=true— enables JS rendering for heavy .onion pages
Backend only (FastAPI on port 8000):
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reloadFrontend only (Next.js on port 3000):
cd web
npm install
npm run devFull stack with Docker:
docker compose up --buildThis starts: PostgreSQL (port 5433), Tor proxy (9050), FastAPI (8000), Next.js (3001).
Just database + Redis for live-service development:
docker compose up postgres torThen run the FastAPI backend locally pointing at Docker services:
export DATABASE_URL="postgresql://voidaccess:voidaccess@localhost:5433/voidaccess"
export TOR_PROXY_HOST="127.0.0.1"
export TOR_PROXY_PORT="9050"
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reloadUse GitHub issues. Include:
- VoidAccess version (check
git log -1 --format=%hor Docker tag) - OS and Docker version
- The query that triggered the issue
- What happened vs. what you expected
- Relevant logs
Where logs are in Docker:
docker compose logs fastapi
docker compose logs nextjsIf you're hitting a dark web search engine that returns nothing, that's not a bug — those services go offline frequently. Check the health sidebar in the app before filing.
- Fork the repo
- Create a branch:
fix/short-descriptionfeat/short-descriptiondocs/short-description
- Make your changes
- Open a PR against
main
Your PR needs:
- A clear description of what changed and why
- Before/after screenshots if it's a UI change
- A note if it touches the scraping or extraction pipeline (test it against at least two real queries and include results)
Keep PRs focused. One feature or fix per PR.
Run formatting before pushing:
black .
isort .
ruff check .Type hints are required for new functions. Check types:
mypy .Tests live in tests/. Run them:
pytest tests/
pytest tests/test_api.py -v # specific file
pytest tests/ --cov=. --cov-report=term-missing # with coverageBackend architecture:
api/main.py— FastAPI entry point, route registration, middlewareapi/routes/— HTTP route handlers (depends on services inllm.py,search.py,scrape.py, etc.)db/models.py— SQLAlchemy ORM modelsdb/queries.py— Raw SQL queries for performance-critical operationssources/— Enrichment sources (threat intel APIs)extractor/— Entity extraction pipelinemonitor/— Scheduled jobs and alerting
Route handlers call services. Don't put business logic in route files.
Run linting and type checks:
cd web
npm run lintFrontend architecture:
web/app/— Next.js App Router pagesweb/components/— React componentsweb/lib/— API client, utilities, auth helpers- State lives in React hooks or URL params
Follow existing component patterns.
Don't add dependencies without a good reason. If you need one:
- Add it to
requirements.in(unpinned) - Run:
pip-compile requirements.in --output-file requirements.txt - Explain why in your PR
Same for the frontend — add to web/package.json, explain in PR.
This is the most common meaningful contribution. Here's how it works:
Each source in sources/ is an async function that returns raw data from an external API. The results then get converted to "page-shaped dicts" that the entity extractor can process.
A page-shaped dict has:
{
"url": "https://source.example.com/...",
"link": "https://source.example.com/...",
"content": "extracted text for entity extraction",
"text": "extracted text for entity extraction",
"status": 200,
"source": "source_name",
"title": "Page title",
"via": "source_api",
}In sources/enrichment.py, the enrich_investigation() function runs multiple sources concurrently and aggregates results. Your source should:
- Take a query string and optional entities list
- Return list[dict] of enrichment results
- Handle rate limits gracefully (log a warning and return empty on failure)
- Include a 30-second timeout per request
Example structure (see sources/cisa.py, sources/shodan.py for real implementations):
async def enrich_example(query: str) -> list[dict]:
"""Query Example API for threat intel."""
results = []
try:
async with aiohttp.ClientSession() as session:
async with session.get(f"https://api.example.com/search?q={query}") as resp:
if resp.status == 200:
data = await resp.json()
for item in data.get("results", []):
results.append({
"source": "example",
"entity_value": item.get("indicator"),
"extra_field": item.get("extra"),
})
except asyncio.TimeoutError:
logger.warning("Example: request timed out")
except Exception as e:
logger.warning("Example: error %s", e)
return resultsIn sources/enrichment.py, add your function to the _enrich_new_sources() async gather:
async def _enrich_new_sources(query: str, entities: list[dict]) -> list[dict]:
# ... existing sources ...
example_results, = await asyncio.gather(
enrich_example(query),
return_exceptions=True,
)
if isinstance(example_results, Exception):
example_results = []
pages.extend(_example_to_pages(example_results))Then write a converter function that transforms your raw results into page-shaped dicts.
Your source needs to specify:
- What it queries against (IP, hash, domain, actor name, CVE, etc.)
- What it returns (dict structure with entity_value, source, and meaningful fields)
- Rate limits and how to handle failures (most APIs have free tiers with rate limits)
The content filter blocks material that makes this tool useful for offensive operations. It's not up for modification in any direction — don't try to weaken what it blocks, and don't change how it works silently.
PRs that touch the filter to weaken it will be closed without discussion.
The content filter has its own test coverage in tests/test_content_safety.py. To run just these tests:
pytest tests/test_content_safety.py -vOpen an issue on GitHub. For security issues, see SECURITY.md.