A global AD4M Expression Language that resolves the canonical git+https:// URI scheme to file content or directory listings from any reachable Git host.
Built with the modern ALDK (@coasys/ad4m-ldk) pattern and scaffolded from ad4m-expression-language-template.
Status: v0.1 — full initial release. 96/96 tests green; typecheck clean; bundle ~45 KB.
Any AD4M agent that installs this Language can resolve URIs like:
git+https://github.com/coasys/we-schemas.git#main:schemas/community-home.json
git+https://github.com/coasys/we-schemas.git#v1.2.0:schemas/community-home.json
git+https://github.com/coasys/we-schemas.git#a1b2c3d4…:schemas/community-home.json
git+https://github.com/coasys/we-schemas.git#pull/123:schemas/community-home.json
git+https://github.com/coasys/we-schemas.git#main:schemas/
git+https://github.com/coasys/we-schemas.git?lines=10-80#main:schemas/community-home.json
git+https://github.com/coasys/we-schemas.git?jsonpath=%24.title#v1.2.0:schemas/community-home.json
to an Expression<T> containing the file content (or, for trailing-slash paths, the directory listing).
The scheme is the canonical npm / Cargo / pip form for addressing Git remotes — not invented for AD4M. Fragment carries <ref>:<subject>; query parameters carry transforms applied after blob fetch.
See the full URI grammar in the proposal.
A complete release covering the four originally-staged phases in one shot.
- Canonical URI parser — full grammar: refs (SHA / branch / tag /
pull/<n>[/head|merge]), subjects (file blob, tree listing), query params (lines,bytes,jsonpath,fields,format,recursive). Round-trip serialiser for cache-key normalisation. - GitHub provider — refs, tags, PRs, trees, blobs, default branch via the REST API.
- GitLab provider — refs, tags, MRs (mapped to "pull"), trees, blobs via
/api/v4. - Gitea provider — refs, tags, PRs, trees, blobs via
/api/v1(self-hosted hosts allowlisted viaGITEA_HOSTS_CSV). - Raw-HTTP fallback — opt-in via
ENABLE_RAW_HTTP_FALLBACK. Best-efforthttps://<host>/<o>/<r>/raw/<ref>/<path>for unrecognised hosts. Refs are passed as-is (no SHA resolution); tree listings and PR refs throw. - Tiered LRU + TTL cache — separate caches for blobs (permanent until LRU eviction), trees (permanent), refs (60 s default), default branches (1 h default). All sizes configurable via template variables.
- Request deduplication — concurrent identical fetches share one in-flight promise. Stampede protection on hot URIs.
- Conditional ref polls —
If-None-Matchwith cached ETags. 304s don't refresh the SHA but do extend the TTL. - Sub-file ranges —
?lines=<from>-<to>(1-indexed, inclusive) and?bytes=<from>-<to>(0-indexed, inclusive). Both clamp out-of-bounds. - JSONPath transforms —
?jsonpath=$.foo[*].bar. Subset: root, field access, index, wildcard, recursive descent. - Field picker —
?fields=name,versionfor shallow JSON projection. - Format hints —
?format=raw|stripped|minified.strippedremoves//+/* */comments and trailing commas. - Tree listings — trailing-slash subject path returns directory contents.
?recursive=1for the whole tree. - PR refs —
pull/<n>(head) andpull/<n>/mergeresolve to PR commits via the provider's PR endpoint. - Per-host auth —
AUTH_TOKENS_JSONtemplate variable carries a host→token map. Anonymous when absent. -
isImmutable— returnstruefor SHA-pinned URIs;falseotherwise. Enables downstream caches to pin content. - Comprehensive tests — 96 across URI parsing, fragment, transforms, cache + dedup, providers (mock transport), end-to-end resolution.
- Radicle provider — depends on the Radicle Service Language in the
git-link-languagesuccessor spec. -
git+ssh://andgit+http://schemes — reserved in the grammar; v0.1 implements onlygit+https://. SSH requires a host SSH primitive (currently unavailable in the executor). - Submodule traversal — follow
commit-typed tree entries into the referenced repo. Niche; off by default behind a flag. - Symlink following — currently returns the symlink target string. Optional opt-in to resolve transitively (mode
120000). - Per-org token maps — extend
AUTH_TOKENS_JSONto nested{ "github.com": { "myorg": "ghp_…", "*": "default" } }for users with multiple PATs per host. - Richer JSONPath — filter expressions, slices, multi-key selectors (RFC 9535 full grammar).
- Streaming for very large blobs —
bytes=ranges currently fetch the whole blob first. A streaming path would matter only for binary blobs > MBs. -
expression.createfor short-lived gist-like content — currently unsupported; v0.1 considersgit+https://addresses editorial. A future variant could integrate with the Gist API or repocontentsendpoint for create-via-PR flows. - Inline image preview / MIME detection — return decoded binary as data URLs when consumers ask for them.
- Provider health surface — expose rate-limit remaining quotas via a custom query / signal.
index.ts
↓
defineLanguage({ expression: { get, isImmutable, icon, … } })
↓
GitResolver (src/resolve.ts)
├── parse URI (src/uri.ts) — canonical git+https grammar
├── ref → SHA (provider) — cached + ETag-conditional
├── tree at SHA (provider) — cached
├── blob fetch (provider) — cached
├── fragment slice (src/fragment.ts) — lines / bytes
└── transforms (src/transforms/) — jsonpath / fields / format
↓
Expression<T>
Providers (src/providers/):
github.ts — github.com
gitlab.ts — gitlab.com + GITLAB_HOSTS_CSV
gitea.ts — GITEA_HOSTS_CSV
raw-http.ts — fallback, ENABLE_RAW_HTTP_FALLBACK
Expressions are not AD4M-signed. proof.signature is empty. The trust model is:
- The URI is the assertion — readers who trust
github.com/coasys/we-schemastrust whoever has write access. - SHA-pinned URIs are content-addressed by Git — the provider returns either the exact content or fails.
- Mutable references (branch, tag, PR) can change. Tags are by convention immutable; branches and PRs move by design.
- Higher-level Languages can wrap
git+https://URIs in signed link expressions when authorship of the assertion that this URI is interesting matters.
//!@ad4m-template-variable
const AUTH_TOKENS_JSON = "{}"; // {"github.com":"ghp_…","gitlab.com":"glpat_…"}
//!@ad4m-template-variable
const BRANCH_REF_TTL_MS = "60000";
//!@ad4m-template-variable
const TAG_REF_TTL_MS = "86400000";
//!@ad4m-template-variable
const DEFAULT_BRANCH_TTL_MS = "3600000";
//!@ad4m-template-variable
const BLOB_CACHE_MAX_ENTRIES = "1024";
//!@ad4m-template-variable
const TREE_CACHE_MAX_ENTRIES = "256";
//!@ad4m-template-variable
const REF_CACHE_MAX_ENTRIES = "256";
//!@ad4m-template-variable
const ENABLE_RAW_HTTP_FALLBACK = "false";
//!@ad4m-template-variable
const GITEA_HOSTS_CSV = "";
//!@ad4m-template-variable
const GITLAB_HOSTS_CSV = "";NODE_ENV=development pnpm install
pnpm test # 96 tests across 28 suites
pnpm typecheck # tsc --noEmit
pnpm build # → build/bundle.js (~45 KB)├── index.ts # defineLanguage entry
├── esbuild.ts # Build script
├── package.json
├── tsconfig.json
├── src/
│ ├── types.ts # Expression, BlobMeta, TreeMeta, TreeEntry
│ ├── adapters.ts # Transport / Storage / Runtime / Signing
│ ├── adapters-deno.ts # Concrete adapters wrapping ad4m:host
│ ├── auth.ts # AUTH_TOKENS_JSON parser
│ ├── cache.ts # TtlLruCache (used for all four tiers)
│ ├── dedup.ts # In-flight request coalescer
│ ├── uri.ts # Canonical Git URI parser + serialiser
│ ├── provider.ts # GitProvider interface + registry
│ ├── providers/github.ts # GitHub REST API client
│ ├── providers/gitlab.ts # GitLab REST API client
│ ├── providers/gitea.ts # Gitea REST API client
│ ├── providers/raw-http.ts # Generic raw-HTTP fallback
│ ├── fragment.ts # Line / byte range slicing
│ ├── transforms/index.ts # Transform pipeline
│ ├── transforms/jsonpath.ts # JSONPath subset
│ ├── transforms/fields.ts # Field picker
│ ├── transforms/format.ts # raw / stripped / minified
│ └── resolve.ts # End-to-end resolver
└── tests/
├── uri.test.ts
├── auth.test.ts
├── cache.test.ts
├── fragment.test.ts
├── transforms.test.ts
├── providers-github.test.ts
└── resolve.test.ts