- E — Entities (identification of entities, aliases, emails, IPs).
- S — Signals (capture of digital footprints and metadata).
- T — Targeted (focused on specific objectives).
- E — Extraction (automated extraction from web sources).
- R — Reconnaissance (recon and footprinting).
- O — Open-source (the nature of the OSINT engine).
- I — Intelligence (processing and correlation of data).
- D — Data (massive ingestion of unstructured records).
- E — Engine (the central engine that orchestrates queries).
- S — Scraper (automated and persistent collection).
From the creators of LazyOwn Redteam Framework comes a free and open-source
intelligence (OSINT) aggregator and correlation engine
inspired by Palantir, Bellingcat, Maltego, and Citizen Lab workflows.
A pure open-source re-imagining of the original fucklantir /
osint_palantir toolchain, with a much bigger source catalogue, a
proper knowledge graph, structured parsers, and a multi-backend LLM
analyst.
No payloads. No active scanning. Just 99+ free public OSINT sources, fanned out in parallel, fused into a single intelligence picture.
+--------------------------+
query "example.com" | Estorides Orchestrator | -> STIX 2.1 bundle
---------------------> | - async fanout | -> MISP event JSON
| - 99+ free sources | -> GraphML for Gephi
| - structured parsers | -> JSONL for training
| - entity resolution |
| - knowledge graph |
| - ontology engine | <- OFAC SDN cross-check
| - MITRE ATT&CK mapper | <- technique auto-tagging
| - SSRF guard | <- blocklist at egress
| - audit log + RL | <- per-IP trail
| - multi-LLM analyst | <- BLUF / tactical / system
+--------------------------+
|
v
Web UI: map / graph / timeline / results
Estorides is structured around small, single-responsibility registries so adding a new source, backend, inferer, or feed never requires touching the central orchestrator. The five plug-in surfaces are:
| Surface | Decorator | File | Used for |
|---|---|---|---|
| Source parsers | @register_parser("name") |
estorides_core/parsers.py |
Translate raw HTTP into structured dicts |
| LLM backends | @register("name") |
estorides_llm/manager.py |
Add an LLM provider (ollama, openai, …) |
| Relationship inferers | @register_inferer("source") |
estorides_core/relationship_inference.py |
Source -> graph edges |
| Real-time feeds | subclass Feed |
estorides_core/feeds.py |
Map layers (quakes, fires, news) |
| Encrypted exporters | estorides_export.encryption |
estorides_export/encryption.py |
STIX/MISP + age encryption |
| Capability | Original | Estorides |
|---|---|---|
| Number of free OSINT sources | ~20 | 99 |
| Intelligence categories | 6 | 12 |
| HTTP fanout model | sequential | async |
| Retries + backoff + circuit breaker | basic | yes |
| Response cache (SQLite) | none | yes |
| Per-source parsers | none | 50+ |
| Entity extraction (IP, domain, CVE…) | regex only | structured + dedup |
| Knowledge graph | none | NetworkX + GraphML |
| STIX 2.1 / MISP export | none | yes |
| Multi-LLM (Ollama / OpenAI / Anthropic) | Ollama only | 4 backends + stub |
| Map (geolocation results) | PyVista 3D | Leaflet 2D |
| Force-directed graph view | none | D3.js |
| API key handling | none | per-source env vars |
| Paid source support | none | flag-based opt-in |
| OFAC SDN sanctions cross-check | none | ontology engine |
| MITRE ATT&CK technique auto-tagging | none | ~40 techniques |
| SSRF / private-NW egress guard | none | RFC1918 + cloud IMDS blocked |
| Audit log (per request, append-only) | none | JSONL with IP+query+latency |
| Per-IP rate limit (sliding window) | none | default 30/min, env-tunable |
| Encrypted export (age) | none | opt-in via ?key=age1… |
| Real-time feed layers | none | earthquakes + fires + news |
| Encrypted export (age) | none | opt-in via ?key=age1… |
| Capability | v1.0 | v1.1 (this) |
|---|---|---|
| Persistent graph (Cypher queries) | NetworkX dump | Kùzu embedded DB, cross-run joins |
| Run persistence | JSONL append | SQLite cases with FK observations/entities |
| Cross-feed entity resolver | none | Wikidata + OFAC + IP-API + NVD via intel_resolver |
| Fuzzy entity clustering | exact dedup | difflib SequenceMatcher, 0.85 threshold, aliases surfaced |
| Extra OSINT endpoints (keyless) | 99 YAML sources | +7 (BGP, MAC, phone, GitHub, leaks, CISA KEV, malware C2) |
| Read-only Cypher endpoint | none | /api/intel/graph?q=... with write-keyword guard |
| Case history UI | none | Cases tab + full-entity inspector |
+--------------------------+
query "example.com" | Estorides Orchestrator | -> STIX 2.1 / MISP / GraphML / JSON
---------------------> | + async fanout |
| + 99 free sources |
| + 7 Osiris-style probes |
| + SSRF guard + audit |
| + ontology engine |
| + MITRE ATT&CK mapper |
| + multi-LLM analyst |
| + cross-feed resolver | <- Wikidata SPARQL + OFAC + IP-API + NVD
| + fuzzy entity cluster | <- difflib SequenceMatcher
+-----------+--------------+
|
+----------------+-----------------+--------------------+
v v v
+------------------+ +------------------+ +------------------+
| Kùzu graph DB | | SQLite case store| | In-memory NX |
| (Cypher queries) | | (FK observations)| | (per-run working)|
| 99 node labels | | search by entity | | per-run edges |
| 9 REL types | | search by query | | |
+------------------+ +------------------+ +------------------+
^ ^
+---------/api/intel/resolve-------+
+---------/api/cases/...-----------+
| Endpoint | Purpose |
|---|---|
GET /api/cases?q=<substr>&type=<qtype> |
List past runs. Searchable by query substring. |
GET /api/cases/<id>?full=1 |
Replay a case. full=1 includes observations + entities. |
DELETE /api/cases/<id> |
Drop a case. |
GET /api/intel/resolve?type=<t>&id=<v> |
Cross-feed resolution. type is one of ip, domain, company, person, country, cve, btc_address, eth_address. |
GET /api/intel/graph?q=<cypher> |
Read-only Cypher against the Kùzu graph. Mutations (CREATE/MERGE/SET/DELETE) are rejected. |
GET /api/intel/stats |
One-glance dashboard: case count, Kùzu node/edge counts, resolver cache size. |
GET /api/osiris/bgp?query=<ip|ASxxxxx> |
BGP / ASN lookup via bgpview.io. |
GET /api/osiris/mac?mac=00:1A:... |
MAC OUI vendor via macvendors.co. |
GET /api/osiris/phone?number=+14155552671 |
Phone geolocation (NANP area code → lat/lng). |
GET /api/osiris/github?user=torvalds |
GitHub user + 5 most recent repos. |
GET /api/osiris/leaks?email=... |
XposedOrNot breach analytics (more detail than HIBP). |
GET /api/osiris/cisa-kev?limit=10&days=30 |
CISA Known Exploited Vulnerabilities, recent window. |
GET /api/osiris/malware?limit=200 |
Feodo Tracker + URLhaus active C2, geolocated. |
The case store keeps a per-run silo: each investigation writes its own
copy of what it saw, keyed by case_id. The same entity seen across fifty
runs becomes fifty rows, and the relational store cannot answer "everything
we know about X, from every source, across every case".
The fusion datastore (estorides_core/fusion_store.py) is the
data-fusion layer that closes that gap — the relational analogue of the
Kùzu graph. Every run feeds it, and it accumulates a normalised,
deduplicated, source-attributed fact base across runs:
- Deterministic identity — every entity gets
sha1(type:normalized)as its id, so the same real-world entity computed in two different runs lands on the same row with no coordination. The resolver'scanonical_idis recorded alongside but never the dedup key. - Provenance survives the merge — when N feeds corroborate an entity,
the record is merged but every contributing source is retained, so
source_countgrounds confidence in how many independent feeds agree (e.g.1.1.1.1corroborated by 20 sources). - Property fusion with conflict preserved — each source's flat facts are
attributed to the target entity. Agreement is surfaced
(
country=Australiaby 2 sources); disagreement is kept with its provenance (region=New South Walesvsregion=Queensland) instead of silently picking one — exactly what an intelligence fusion store must do. - Relationships fused — analytic edges from the knowledge graph (skip
the
observed_by/co_occursplumbing) accumulate cross-run, with both endpoints always materialised so the graph is navigable from either side.
It mirrors the rest of the persistence layer: WAL SQLite, one serialised
connection, and fail-soft — without a writable data dir a run still
returns, it just leaves nothing in the fused store. Toggle with
ESTORIDES_FUSION_ENABLED=0; relocate with ESTORIDES_FUSION_DB=/path.
| Endpoint | Purpose |
|---|---|
GET /api/fusion/stats |
Size of the fused base: entities, multi-source count, observations, properties, relationships, by-type breakdown. |
GET /api/fusion/sources |
YAML source catalogue with accumulated fetch/ok counters. |
GET /api/fusion/entities?q=&type=&min_sources=2 |
Search fused entities. min_sources=N is the fusion-native "only what ≥N feeds corroborate" filter. |
GET /api/fusion/entity/<id>?min_sources=2 |
Full fused view of one entity: provenance, properties, edges, and the corroborated (multi-source-agreed) properties. |
The fused stats are also folded into GET /api/intel/stats under fusion.
python3 estorides_cli.py run 1.1.1.1 # fan out + fuse into the store
python3 estorides_cli.py fusion stats # how big is the fused base
python3 estorides_cli.py fusion entities --min-sources 2 # only corroborated entities
python3 estorides_cli.py fusion entity <id> # full provenance + properties
python3 estorides_cli.py fusion sources # per-source fetch historypip install -r requirements.txt
The only new required dep is kuzu>=0.11. The orchestrator falls
back to in-memory NetworkX if Kùzu is not importable, but a persistent
cross-run graph only happens with Kùzu present.
cd estorides
python3 -m pip install flask networkx requests pyyamlOptional, for a real LLM:
# pick one
ollama serve && ollama pull llama3.1:8b
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export OPENROUTER_API_KEY=sk-or-...# 99 sources, 12 categories
python3 estorides_cli.py status
# run a query (free sources only)
python3 estorides_cli.py run 8.8.8.8
# enable sources that need an API key
python3 estorides_cli.py run user@example.com --include-paid
# only a subset of sources
python3 estorides_cli.py run example.com \
--only-sources crt_sh_certificates,shodan_internetdb,ipapi_free
# export the latest run as STIX 2.1 or MISP
python3 estorides_cli.py stix --out my_bundle.json
python3 estorides_cli.py misp --out my_event.jsonpython3 estorides_cli.py serve --port 5050
# open http://127.0.0.1:5050
UI features:
- 2D map (Leaflet) of every geolocated result
- D3.js force-directed knowledge graph (drag, zoom, hover)
- Timeline of source acquisition
- Source results panel with per-source parsed output
- Filterable entity list
- LLM analysis with backend / model badge
- One-click export: STIX 2.1, MISP, GraphML, JSON
The Graph canvas turns observations into an interactive intelligence workbench:
- Clusters — nodes are grouped into communities (translucent hulls) and coloured by cluster. Inter-cluster links are dashed/highlighted; click one to see a cross-reference tooltip explaining how two clusters relate (the bridge entities + relation).
- Click to enrich — left-click a node to resolve it (cross-feed + VirusTotal relationships) and merge the new nodes/links into both the graph and the map. Each new node is itself clickable, so exploration is recursive.
- Intelligence tiers — every node carries an auto-computed level shown
as a coloured ring:
data→information(≥2 corroborating sources) →intelligence(cross-cluster / resolved) →counter-intelligence(sanction / threat / VirusTotal-malicious). Override any node's level from the right-click menu or the inspector (persists in the browser). - Transforms — right-click a node (or use the side inspector panel) for transforms grouped by tier: data → information → intelligence → counter-intelligence.
VirusTotal is integrated both as a source (vt_ip, vt_domain,
vt_file) and as the relationship engine behind graph expansion
(resolved domains/IPs, communicating/dropped files, contacted infra).
It needs a free API key; without it VirusTotal stays inactive and the
rest of the platform is unaffected:
export VT_API_KEY=... # https://www.virustotal.com/gui/my-apikey- DNS Intelligence (9) - Google DoH, Cloudflare DoH, HackerTarget, crt.sh, Cert Spotter, RDAP, DNS Dumpster, host search
- IP & Infrastructure (13) - ip-api, ipinfo, ipapi.co, ipwho.is, Shodan InternetDB, GreyNoise, ipwhois, Robtex, RDAP, AS lookup, AbuseIPDB, MAC OUI, RIPE Stat
- Web Intelligence (10) - urlscan, Wayback CDX, Wayback availability, HTTP headers, whois, geoip, traceroute, nping, Microlink, Google cache
- Social Media (13) - GitHub, Reddit, Mastodon, Keybase, HackerNews, Telegram, Pinterest, WordPress, Medium, DEV.to
- Threat Intelligence (13) - ThreatFox, URLhaus, payloads, PhishTank, OpenPhish, OTX (+passive domain/IP, no key), MalwareBazaar, Feodo, SSLBL, Emerging Threats, blocklist.de
- Breach Intelligence (6) - HIBP breaches, HIBP pastes, Phonebook email, Phonebook domain, DeHashed, IntelligenceX
- Geolocation (5) - Nominatim search + reverse, OpenWeather geocoding, TimeZoneDB, Wikidata
- Knowledge (12) - Wikipedia, summary, DuckDuckGo IA, OpenAlex, Crossref, arXiv, GitHub advisories, NVD CVE, cve.circl, ExploitDB, Reddit subreddit search
- Wireless (5) - WiGLE, IEEE OUI, OpenSky, MarineTraffic, N2YO
- Blockchain (5) - blockchain.info (balance + tx), Blockstream, Ethplorer, mempool.space
- Paste & Leaks (4) - psbdmp, GitHub gist search, TGStat, LeakCheck
- Visual (4) - ScreenshotMachine, Microlink, TinEye, EXIF
Sources are addons: one YAML file per source, organised into category
subdirectories under sources/ (lazyaddons-style). The loader recurses, so
add a new source by dropping sources/<NN_category>/<name>.yaml — no central
registry to edit. Grouped multi-document files still load if present. Point
ESTORIDES_SOURCES_DIR at another tree to use your own addon set. The schema
is documented at the top of estorides_core/source_loader.py.
sources/
01_dns/
dns_google.yaml
crt_sh_certificates.yaml
02_ip_infra/
shodan_internetdb.yaml
...
tools/split_sources.py migrates legacy grouped files into this layout.
sources/ one YAML per addon, grouped by category dir (99 addons)
estorides_core/
config.py every tunable (env-overridable)
source_loader.py registry, validation, lookup
async_client.py aiohttp + circuit breaker + SQLite cache
parsers.py 50+ structured parsers (ipapi, dns_json, crtsh…)
entity_extraction.py regex-based entity finder with dedup
knowledge_graph.py NetworkX MultiDiGraph + GraphML export
orchestrator.py glues everything, infers higher-level relations
estorides_llm/
manager.py multi-backend LLM (Ollama → OpenRouter → Anthropic → OpenAI → stub)
estorides_export/
stix.py STIX 2.1 bundle export
misp.py MISP event JSON export
estorides_cli.py argparse CLI
estorides_web.py Flask app
templates/index.html UI
static/{css,js}/estorides.* UI styles + D3 controller
- Start with the free-tier sources (default) — that is 80+ endpoints.
- Set
ESTORIDES_PARALLEL=16for faster fanout. - Set
ESTORIDES_TIMEOUT=20if your network is slow. - Disable paid sources you don't have keys for by setting
ESTORIDES_DISABLE_BACKENDS=openai,anthropic(or by leaving--include-paidoff in the CLI). - The SQLite cache lives in
data/estorides_cache.sqlite— delete it to force fresh fetches. - The LLM stage needs a generative model. Ollama auto-selects an
installed model if
ESTORIDES_OLLAMA_MODELis missing, but an embedding-only model (e.g.*:e2b) returns no text and the run falls back to the stub.ollama pull llama3.1:8bfor real analysis.
| Env var | Default | What it caps |
|---|---|---|
ESTORIDES_DEADLINE via --deadline |
30s | hard wall-clock cap for the whole fanout |
ESTORIDES_ENTITY_MAX_SCAN |
120000 | chars scanned per response (huge crt.sh/wayback dumps) |
ESTORIDES_ENTITY_MAX_PER_TYPE |
750 | entities kept per type per source |
ESTORIDES_KG_MAX_COOCCUR |
30 | entities per source in the co-occurrence clique (O(n²) guard) |
ESTORIDES_LLM_REQUEST_TIMEOUT |
12s | per-call LLM HTTP timeout (no orphaned threads) |
For attack-surface scoping the two things that matter are: never let the target observe a probe attributable to your recon window, and never let a queried broker tie the lookups back to your real IP. Estorides enforces both at the engine level.
Every source declares how its traffic reaches the target:
contact |
Meaning | In --passive-only? |
|---|---|---|
none (default) |
Only a third-party DB / resolver / CT log is hit; the target sees nothing | kept |
broker |
A third party actively probes the target on your behalf (ping, traceroute, header fetch) | excluded |
active |
The engine connects to the target's own infrastructure directly | excluded |
An unknown/typo class is treated as active, so a passive-only run can
never be silently widened. --passive-only is enforced even for an
explicit --only-sources list. Sources that log your lookups also carry
logs_queries: true (surfaced in status).
# scope a domain without ever touching its infrastructure
python3 estorides_cli.py discover example.com --passive-only --out-json surface.json
python3 estorides_cli.py run example.com --passive-onlyRoute every outbound request through a proxy so brokers never see your
real IP. SOCKS (Tor) needs aiohttp_socks; HTTP/HTTPS proxies work with
stock aiohttp and a comma-separated pool rotates per request.
python3 estorides_cli.py run example.com --passive-only --tor
python3 estorides_cli.py run example.com --proxy socks5://127.0.0.1:9050
export ESTORIDES_HTTP_PROXY_POOL="http://p1:8080,http://p2:8080"Fail-closed: if a SOCKS proxy is requested without aiohttp_socks
installed, the client refuses to run rather than fall back to a
deanonymising direct connection. When proxying, the SSRF guard's local
DNS-resolution leg is skipped (ESTORIDES_PROXY_REMOTE_DNS=1, default) so
your resolver never learns which targets you are investigating — the
literal-host guard still runs and the exit node resolves the name.
Turn a discovered surface into in/out-of-scope flat lists you can pipe into the active phase. Out-of-scope always wins, so an excluded asset is never targeted by accident.
python3 estorides_cli.py scope \
--assets surface.json \
--scope program_scope.txt \
--out scope_result.json \
--flat-dir ./scope_out
# -> scope_out/in_scope_hosts.txt, in_scope_ips.txt, unknown.txtRules file grammar (one per line, # comments; a ## out-of-scope
divider separates the two lists):
*.example.com wildcard host suffix (apex + subdomains)
api.example.com exact host
192.0.2.0/24 CIDR (IPv4 or IPv6)
re:^staging-[0-9]+\.ex regex (prefix re:)
## out-of-scope
blog.example.com
192.0.2.200/32
- This is a passive intelligence tool. It does not probe, exploit, or interact with the target beyond what the public sources allow.
- All API keys stay in environment variables; they are never written to disk.
- Respect the rate limits of the upstream services. The circuit breaker will back off automatically when a host starts returning errors.
- Output is for legitimate OSINT, threat intelligence, journalism, academic research, and defensive security work.
| Concern | Control | Where |
|---|---|---|
| Outbound to RFC1918 / loopback / cloud IMDS | SSRF guard runs on every URL before fetch (allowlist override via ESTORIDES_ALLOWED_HOSTS) |
estorides_core/ssrf_guard.py |
| Web DoS / scraping | Sliding-window per-IP rate limit (default 30/min; tune via ESTORIDES_RATE_LIMIT) |
estorides_core/audit.py |
| Compliance trail | Append-only JSONL audit log of every API call (timestamp, IP, query, sources, status, latency) at data/audit.jsonl |
estorides_core/audit.py |
| Adversarial input | validate_query() rejects empty, oversize, control-char, bidi-override, and unsupported-type queries; bidi is rejected outright rather than silently stripped |
estorides_core/validation.py |
| API key leakage | Keys read from env at call time, never logged, never written to disk | estorides_core/orchestrator.py (_resolve_auth) |
| Encrypted report delivery | age (https://age-encryption.org) opt-in via ?key=age1… on the export endpoint; graceful fallback to plaintext when age is missing |
estorides_export/encryption.py |
| Trusting X-Forwarded-For | Only honoured when ESTORIDES_TRUST_PROXY=1 is set explicitly |
estorides_web.py |
| Target observing a recon probe | Per-source contact class; --passive-only (or ESTORIDES_PASSIVE_ONLY=1) keeps only none |
estorides_core/source_loader.py, orchestrator._select_sources |
| Broker tying lookups to operator IP | Egress proxy/Tor (--proxy/--tor, ESTORIDES_HTTP_PROXY[_POOL]); fail-closed on missing SOCKS lib |
estorides_core/async_client.py |
| DNS leak of investigated targets | Local resolution skipped when proxying (ESTORIDES_PROXY_REMOTE_DNS=1, default); literal-host guard still runs |
estorides_core/async_client.py |
estorides_core/ontology.py loads the OpenSanctions OFAC SDN list
(CC-BY 4.0) once, indexes it by normalised name + alias, and stamps
every observation with {sanctioned, hits, fields}. The LLM analyst
stage then writes a "SANCTIONED — OFAC SDN match on …" line into the
brief so sanctions exposure is impossible to miss in the report.
Index characteristics:
- ~7 MB, low-tens-of-thousands of entries
- 24h lazy refresh
- Single-flight: concurrent first-loads share one fetch
- Best-effort disk cache at
data/ontology_sdn.json - Stale-on-error: keeps the previous snapshot if a refresh fails
estorides_core/mitre_attack.py maps every observation to the
ATT&CK techniques it might support, by both source-keyed table
(40+ techniques across the threat-intel, breach, and web sources)
and keyword scan (catches malware families: mimikatz, cobalt
strike, lockbit, …). Aggregated techniques are exposed at the top
of the orchestrator result as result.mitre.techniques.
estorides_core/feeds.py ships three keyless feeds that the map
UI can layer on top of OSINT results:
| Feed | Source | Refresh | Notes |
|---|---|---|---|
| Earthquakes | USGS M2.5+ GeoJSON | 10 min | Always on |
| Fires | NASA FIRMS VIIRS_NOAA20_NRT CSV | 30 min | Requires ESTORIDES_FIRMS_KEY |
| News | GDELT 2.0 article list | 15 min | Coords unavailable; surfaces at (0,0) |
Endpoint: GET /api/feeds?bbox=min_lon,min_lat,max_lon,max_lat&no_cache=1.
estorides_llm/intelligence_prompts.py ships three prompt styles:
system— the default Palantir-grade analyst with BLUF + confidence-graded findings.bluf— single-paragraph BLUF only, for time-critical briefs.tactical— adds THREAT PICTURE + COA-1/2/3 + IMMEDIATE ACTION.
Backend priority is configurable: ESTORIDES_BACKEND_PRIORITY=openai,ollama
or via the LLMManager constructor.
# All tests, ~10s
python3 _validate.py
# Individual suites
python3 _test_ssrf.py # 20 SSRF cases
python3 _test_validation.py # 16 input-validation cases
python3 _test_feeds.py # 3 real-time feeds
python3 _test_encryption.py # age encryption + graceful degradation
python3 _test_routes.py # Flask route table
python3 _multi_test.sh # end-to-end: 5 query types through the orchestratorThe validator exits 0 only when every check passes. CI runners can
grep FAIL to surface regressions.