Pinakes is a single static Go binary — a local, single-user tool that exposes
the same in-process engine over three surfaces: a stdio MCP server
(pinakes mcp), a loopback REST surface (pinakes serve), and one-shot CLI
verbs. It reads from a fixed set of public biomedical data sources (Ensembl,
UniProt, PDB, ChEMBL, gnomAD, ClinVar, openFDA, PubChem, AlphaFold,
ClinicalTrials.gov, NCBI virus and protein) and turns their responses into
deterministic, byte-reproducible,
provenance-stamped records.
This document describes the security posture of this engine — the downloadable binary. It is not the hosted service; anything that depends on multi-tenant infrastructure, account credentials, or controlled-access data is out of scope and called out explicitly in Out of scope / Not yet provided.
The engine runs as a local process under one user's account, over public data. Its security goals, in priority order, are:
- Integrity / tamper-evidence of the data it produces — a result either reproduces exactly or is rejected loudly.
- Bounded egress — the process only talks to the declared,
https://source endpoints, and only through one transport seam. - Resource bounds for the local process — a hostile or runaway input cannot trivially exhaust memory or stall the engine.
It is not an authentication or authorization boundary. The REST surface is unauthenticated and binds loopback only; anyone with local access to the machine (or the loopback port) can drive the engine. That is by design for a local tool and is stated plainly below.
Please report security issues privately through GitHub's private vulnerability reporting:
Repository → Security tab → Report a vulnerability (GitHub Security Advisories).
This opens a private advisory visible only to the maintainers. Please do not open a public issue for a suspected vulnerability, and please do not post details on social media or mailing lists before a fix is available.
When reporting, include where possible:
- the version (
pinakes version) and platform, - the surface involved (
mcp,serve, or a CLI verb), - a minimal reproduction (the query / request and the observed behavior),
- the impact you believe it has.
There is no bug-bounty program. We will acknowledge reports and work in good faith toward a fix; we make no guaranteed response-time SLA for this open-source binary.
Security fixes target the latest released version on the current release line. There is no long-term-support back-port commitment for older tags. Always update to the most recent release (see Supply-Chain Integrity for how to install and verify it).
| Version | Supported |
|---|---|
| Latest release | ✅ |
| Older releases | ❌ |
The engine's outbound network surface is deliberately narrow.
- Per-connector endpoint allowlist. Each source is configured by a
declarative spec (
connectors.Spec). At load timeSpec.Validate(connectors/spec.go) requires every queryable source to declare at least one endpoint and rejects any endpoint whose URL does not begin withhttps://— the check is labelled in code as the SSRF / egress-allowlist guard. The declared endpoints are therefore the intended egress set: every connector builds its request URLs against its declared base, and paginates by extracting cursor/page tokens (e.g. openFDA'ssearch_afteris read out of theLinkheader as a parameter, never followed as an arbitrary URL), so a connector does not construct requests to undeclared hosts. All shipped specs (surface/provider/specs/*.yaml) declare onlyhttps://endpoints. - Single transport seam. Connectors do not open their own arbitrary
transports. Each connector receives an injected
httpDoer(e.g.connectors/ensembl/ensembl.go: "httpDoer is the only injected transport seam"), and the engine builds every connector with one shared*http.Clientfromprovider.Config.HTTPClient(surface/provider/engine.go). The contract package documents this as the invariant: "the injected httpDoer is the ONLY transport seam" (engine/contract/doc.go). Because there is one chokepoint, outbound traffic is auditable and substitutable in one place.
Limits, stated honestly. The https:// requirement is a scheme check, not a
DNS/IP egress firewall: it does not by itself prevent a maliciously crafted spec
from naming an internal host, nor does it pin TLS certificates beyond Go's
defaults. The shared *http.Client also uses Go's default redirect policy (it
does not install a CheckRedirect host restriction), so an upstream that
responds with an HTTP redirect can send a request to a host outside the declared
set — the mitigation is that the configured sources are trusted public
endpoints, not a network-layer guard. The protection is that the endpoint set is
declared, reviewed config and that all egress flows through one injected client
— not a network-layer allowlist. Operators who need hard network isolation should
run the binary in a sandbox/network policy of their own (see also
Out of scope).
By default the engine stores and transmits no credentials: every source is
public and works with no key. Most connector specs declare auth: none, and
Spec.Validate restricts auth to a closed set (connectors/spec.go:
validAuth).
Optional API keys. Two NCBI sources (ncbi-protein, ncbi-virus) declare
auth: api_key so a user may supply their own NCBI key to raise their own
upstream rate limit. This is entirely optional — with no key, behavior is
byte-identical and the transport is unwrapped. When a key is configured it is
handled conservatively:
- It is set only via the CLI (
pinakes config set-key, which reads the key from stdin, never an argv argument), and stored in~/.config/pinakes/config.yamlwith0600permissions (mode-checked on read, refused if group/world-readable), or supplied viaPINAKES_NCBI_API_KEY. - It is attached only to the outbound
*http.Requestinside a source-awarehttp.RoundTripper(surface/provider/credentials.go) — connectors never see it, and it is never part ofRawRecord.Raw, the logical-record hash, or any manifest (hash purity; proven by a with/without-key byte-identical-hash test). So a key changes rate limits, never results. - It is never exposed to the model / MCP surface (key config is CLI-only, not
an idl verb or MCP tool), never logged, and redacted by default in
pinakes config get/list.
The engine exposes three local surfaces from one binary (cmd/pinakes/main.go).
- REST (
pinakes serve) is loopback-only. The default bind is127.0.0.1:8080(defaultServeAddr,cmd/pinakes/main.go). The bind host is enforced, not merely defaulted:api.ServecallsrequireLoopback(api/router.go), which refuses any addr whose host is notlocalhost, not a loopback IP, or empty (an empty host would bind all interfaces). The code states this is "a defense-in-depth fence for an unauthenticated surface, not auth." The server also sets aReadHeaderTimeout(10s) and drains gracefully on SIGINT/SIGTERM. - Request-body cap. POST bodies are capped at 8 MiB
(
maxBodyBytes = 8 << 20,api/handlers.go) viahttp.MaxBytesReader, so a runaway body becomes a bounded decode error rather than an OOM. The decoder also setsDisallowUnknownFieldsand rejects trailing content, so malformed or stale input is refused locally as a 400 rather than silently accepted. - MCP framing DoS guard. The stdio MCP transport (
mcp/server.go) caps a single inbound frame body viamaxFrameBytes(16 MiB) and the per-frame header block viamaxHeaderBytes(8 KiB). A peer-suppliedContent-Lengthis validated againstmaxFrameBytesbefore any allocation, so a hostile frame cannot drive an unbounded allocation (readFrame,mcp/server.go).
There is no authentication or authorization on any surface. See Out of scope.
Integrity is the engine's primary security property, and it is enforced structurally rather than asserted.
- Content-addressed, self-verifying storage. Snapshot objects are keyed by
the lowercase-hex SHA-256 of their own bytes (
engine/snapshot/backend.go:keyRe, theBackendcontract).PutandGetboth re-hash the bytes and returnErrCorrupton any mismatch — "a mismatch is a hard, loud integrity failure, never silently tolerated." The key shape (bare 64-hex) also prevents path traversal in the filesystem backend. - Offline re-derivation on verify.
Resolver.Verify(engine/resolver/verify.go) does not trust a presented manifest. It reloads the pinned raw record set from the content-addressed store, re-runs the exact assembly the resolver uses, and compares the presented manifest to the re-derived one field by field, ending with a canonical-JSON backstop over every derived field. A tampered logical-record hash, a tampered manifest field, or a corrupted snapshot is rejected with a precise error (ErrHashMismatch,ErrManifestDrift, storeErrCorrupt, etc.). Verify also refuses to attest an unpinned / non-reproducible / immature-source manifest rather than rubber-stamp it. - Determinism is the tamper-evidence. Because the same query against a
pinned snapshot re-derives a byte-identical logical-record set and manifest
(
engine/contract/doc.go, the determinism spine), any alteration of the stored bytes or the manifest changes the re-derived result and fails Verify. This is proven offline: thecase-studies/ebolatests (case-studies/ebola/ebola_test.go) run with all upstream HTTP denied (denyTransport) and assert that (a) Verify reproduces the committed manifest with zero upstream calls, (b) a tamperedlogical_record_hashis refused withErrHashMismatch, and (c) a corrupted snapshot fails loudly. CI runs this suite with-raceon every push (.github/workflows/ci.yml).
This is tamper-evidence (alterations are detected and rejected), not tamper-prevention (the on-disk store is ordinary local files under the user's own permissions).
The engine bounds resource use for the local process at several layers. These are abuse/runaway bounds for a single-user tool, not multi-tenant quotas.
- Inbound body / frame caps — REST 8 MiB body cap and MCP 16 MiB frame /
8 KiB header caps, as above (
api/handlers.go,mcp/server.go). - Result-size hard ceiling.
NormalizedQuery.Limit(engine/contract/query.go) is a hard ceiling on the records a fresh retrieval may return; the resolver enforces it after retrieval (engine/resolver/resolver.go), truncating to the firstLimitrecords (in the connector's imposed deterministic order) and downgrading the result to best-effort — a truncated set is never marked Complete and never materialized as a reproducible snapshot. A pinned read ignoresLimitand reproduces the immutable snapshot verbatim. Known limit: this ceiling bounds what is normalized/hashed/stored/served; it does not yet bound the connector's upstream fetch memory (a tracked follow-up, documented inquery.go). - Upstream rate Governor. Outbound calls go through the Governor
(
engine/governor), which caps the aggregate call rate per source against that source's declaredupstream_limit(token bucket), backs off adaptively on upstream429/Retry-After, and trips a circuit breaker on a hard upstream outage. In this binary the Governor binds a single tenant (surface/provider/engine.go), so it functions here as a per-source outbound-rate bound and outage-tolerance mechanism rather than a multi-tenant fairness guarantee. - Complete-or-fail (no silent partial results). Retrieval reconciles the
fetched count against an authoritative source total; a shortfall or a
mid-walk source mutation downgrades to best-effort instead of silently
returning a partial set, and
Completeis never emitted without a reconciled count (engine/contract/doc.go,engine/contract/query.go). A corrupt store surfacesErrCorrupt, "never a silent empty result" (engine/resolver/resolver.go). The security value: you cannot be quietly handed a truncated answer that looks authoritative.
The engine enforces source license posture in code, which doubles as a data-handling control.
Spec.Validate→validateLicense(connectors/spec.go) rejects any spec that marks a copyleft (share-alike) source as redistributable: a copyleft source "must not set redistribute:true — it is copyleft-isolated, never co-metered."Spec.Posture(connectors/spec.go) derives acontract.LicensePosture(engine/contract/license.go): copyleft sources resolve toCopyleftIsolated, non-redistributable sources toIndexProxy, and only permissively licensed sources toHostServe. TheRedistributable()predicate permits redistribution only forHostServe.- Sources carrying attribution obligations (CC-BY / CC-BY-SA / ODbL) are
required to declare a non-empty attribution string at spec-load time:
validateLicense(connectors/spec.go, gated byrequiresAttribution) rejects such a spec if its attribution string is empty. That string is then carried as structured license metadata on the result manifest and in the JSON response envelope (engine/contract/license.goLicense.Attribution, attached tocontract.Manifestand surfaced viaidl/responses.go). Note: theexportverb in this binary is synchronous and manifest-returning — it Resolves the query and returns a manifest with a synthetic local locator (surface/core/handler.goExport/exportLocation); it does not write export-artifact bytes. Byte-level embedding of attribution into materialized export files is intended (see the docstrings inengine/contract/license.go) but is not yet implemented here — in this engine the attribution obligation is conveyed as manifest/license metadata, not embedded inside an export file.
For this local binary the practical effect is that the copyleft fence is encoded
and validated at spec-load time. The broader serve/cache posture predicates are
present as frozen contract API for a future hosted layer and are documented as
such in engine/contract/license.go.
Releases are built and published by a tag-triggered GitHub Actions pipeline. The following are what the workflow files in this repository actually do:
- CI gate before release.
release.ymlruns the full CI suite (make ci) on the tagged commit and blocks the build/sign/publish job until it passes, so a signature can never attest to an unproven artifact. - Pinned action SHAs. Every GitHub Action in
ci.ymlandrelease.ymlis pinned to a full commit SHA (with a version comment), not a mutable tag. - Reproducible static builds.
.goreleaser.yamlbuilds withCGO_ENABLED=0,-trimpath, stripped ldflags, andmod_timestampset to the commit timestamp — a fully static, reproducible binary. - Checksums + keyless signing. GoReleaser emits a SHA-256
checksums.txtand cosign keyless signs it via Sigstore + GitHub OIDC (.goreleaser.yamlsigns:;release.ymlid-token: write). Verifying the one signature overchecksums.txttransitively verifies every archive. - SBOM. A CycloneDX SBOM is generated per archive via syft
(
.goreleaser.yamlsboms:). - Vulnerability scan. CI runs
govulncheckpinned tov1.3.0(not@latest) so the scan is reproducible (ci.yml). - Dependency updates. Dependabot keeps both the SHA-pinned Actions and the Go
modules current weekly, preserving the pins (
.github/dependabot.yml). - Installer verification.
install.shalways verifies the SHA-256 checksum and refuses to install on mismatch; whencosignis present it additionally verifies the keyless signature with an exact certificate identity pinned to this repo'srelease.ymlworkflow at the installed tag, refusing to install on failure. Whencosignis absent it warns and proceeds with checksum-only verification.
What this is not: there is no SLSA provenance attestation, no third-party audit, and no reproducible-build attestation beyond the goreleaser settings above. Claims here are limited to what the committed workflow files perform.
The engine sends no telemetry, analytics, or usage reports. It makes outbound
network calls only to the declared source endpoints, through the single injected
transport seam (above). The verify path is fully offline — proven by the
case-studies/ebola tests, which run with all upstream HTTP denied and assert
zero network calls. A search of the non-test code finds no analytics, crash-
reporting, or phone-home client of any kind.
(The word "telemetry" appears in a few code comments — e.g. in-process Governor queue-depth counters and the maturity-level docstring — referring to internal operational counters, not any outbound reporting.)
This binary deliberately does not provide the following. These are honest non-goals for a local single-user tool, or concerns that belong to a future hosted service rather than this engine:
- No authentication or authorization. No accounts, API keys, sessions, roles, or per-request authorization. The REST surface is unauthenticated and loopback-only by enforcement; treat local access to the machine (and the loopback port) as full access to the engine.
- No multi-tenancy / isolation guarantees. The Governor is multi-tenant
capable but this binary binds a single tenant
(
surface/provider/engine.go); there is no tenant isolation, quota, or fair-sharing boundary in the downloadable tool. - No controlled-access source pass-through. Controlled-access / DUA data
(the
GatedPassThroughposture inengine/contract/license.go) is not handled by this binary. No connector here is gated, and no controlled credential flow exists. This is a future hosted-service concern. - No biosecurity / dual-use screening. The engine retrieves and normalizes public records as-is; it does not screen sequences or content for biosecurity or dual-use concerns. Such screening, if it ever exists, is not part of this binary.
- No compliance certifications. There is no SOC 2, ISO 27001, HIPAA, or any other certification, no external security audit, and no penetration test associated with this binary. Do not infer any such guarantee from this document.
- No network-layer egress firewall, TLS pinning, or sandbox. Egress is
bounded by declared
https://config and a single transport seam, not by a network policy. Operators needing hard isolation should run the binary inside their own sandbox/network controls. - No connector-side fetch-memory bound. The result
Limitceiling bounds what is stored/served, but a connector can still fetch a large upstream into memory before the ceiling applies (tracked follow-up,engine/contract/query.go). - No at-rest encryption of the local snapshot store. Snapshots are ordinary files under the user's account, protected by OS file permissions only.
If you are deploying Pinakes in a setting that needs any of the above, treat this engine as a local component and supply those controls in the surrounding environment.