|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +A minimal **VGI (Vector Gateway Interface)** worker that exposes one SQL scalar |
| 8 | +function, `easter_date(year)`, returning the Western (Gregorian) Easter Sunday |
| 9 | +date for a year. It is intentionally small — a clean reference example of a VGI |
| 10 | +scalar function. The computation is pure standard-library arithmetic (the |
| 11 | +Anonymous Gregorian Computus); there are no network calls or external data. |
| 12 | + |
| 13 | +VGI lets a worker publish catalogs/schemas/functions that DuckDB can `ATTACH` |
| 14 | +and query natively, exchanging values as Apache Arrow IPC. |
| 15 | + |
| 16 | +## Architecture |
| 17 | + |
| 18 | +Effectively everything lives in two files: |
| 19 | + |
| 20 | +- **`easter_worker.py`** (~160 lines) — the whole worker: |
| 21 | + - `_easter_sunday(year)` — the Computus, returns `datetime.date`. |
| 22 | + - `EasterDateFunction(ScalarFunction)` — maps `pa.Int64Array` years to a |
| 23 | + `pa.date32()` array. Output type is set explicitly via |
| 24 | + `Returns(arrow_type=pa.date32())`; nulls propagate. `Meta` carries the |
| 25 | + function name (`easter_date`), description, and `FunctionExample`s used for |
| 26 | + catalog introspection. |
| 27 | + - `_EASTER_CATALOG` — `Catalog(name="easter")` with a single `main` schema |
| 28 | + holding `EasterDateFunction`. |
| 29 | + - `EasterCatalog(ReadOnlyCatalogInterface)` — advertises `data_version` |
| 30 | + (`DATA_VERSION = "1.0.0"`) and `implementation_version` (`GIT_COMMIT`, from |
| 31 | + `VGI_EASTER_GIT_COMMIT`, else `"unknown"`). |
| 32 | + - `EasterWorker(Worker)` — binds the catalog + interface. `main()` runs stdio |
| 33 | + mode; `main_http()` runs the HTTP server. |
| 34 | +- **`serve.py`** — three lines: imports `EasterWorker` and calls `main_http()`. |
| 35 | + This is the HTTP entrypoint. |
| 36 | +- **`conftest.py`** — puts the repo root on `sys.path` so tests can |
| 37 | + `import easter_worker` / `import serve`. |
| 38 | + |
| 39 | +### The scalar-function pattern |
| 40 | + |
| 41 | +A VGI scalar function subclasses `ScalarFunction` and implements a `compute` |
| 42 | +classmethod whose params/return are annotated: |
| 43 | + |
| 44 | +- inputs: `Annotated[pa.Int64Array, Param(doc=...)]` — Arrow array per argument. |
| 45 | +- output: `Annotated[pa.Array[Any], Returns(arrow_type=pa.date32())]` — when the |
| 46 | + result type isn't the natural inference of the input, set it explicitly with |
| 47 | + `Returns(arrow_type=...)`. |
| 48 | +- null handling is manual: `compute` iterates `year.to_pylist()` and maps |
| 49 | + `None -> None`. |
| 50 | + |
| 51 | +## Dependencies & Python version |
| 52 | + |
| 53 | +Requires **Python 3.13+**, managed with `uv`. Deps are declared inline as |
| 54 | +PEP 723 script metadata in `easter_worker.py` and `serve.py`: |
| 55 | + |
| 56 | +```python |
| 57 | +# dependencies = ["vgi[http,oauth]", "vgi-rpc[sentry]"] |
| 58 | +# [tool.uv.sources] |
| 59 | +# vgi = { path = "../vgi-python" } |
| 60 | +# vgi-rpc = { path = "../vgi-rpc" } |
| 61 | +``` |
| 62 | + |
| 63 | +In development, `vgi` and `vgi-rpc` resolve against the sibling checkouts |
| 64 | +`~/Development/vgi-python` and `~/Development/vgi-rpc`. |
| 65 | + |
| 66 | +## Commands |
| 67 | + |
| 68 | +```bash |
| 69 | +# Run the worker in stdio mode (DuckDB spawns it as a subprocess) |
| 70 | +uv run --python 3.13 easter_worker.py |
| 71 | + |
| 72 | +# Run the HTTP server |
| 73 | +VGI_SIGNING_KEY=dev uv run --python 3.13 serve.py --host 0.0.0.0 --port 8000 |
| 74 | + |
| 75 | +# Unit tests (pytest). The --rootdir/-o flags stop pytest from picking up an |
| 76 | +# upstream pyproject that injects --mypy --ruff. |
| 77 | +uv run --python 3.13 \ |
| 78 | + --with pytest --with pyarrow --with ../vgi-python --with ../vgi-rpc \ |
| 79 | + pytest tests/ --rootdir=. -o "addopts=" -q |
| 80 | +``` |
| 81 | + |
| 82 | +There is no `.venv` checked in (it's gitignored); the `uv run --with ...` |
| 83 | +invocation above resolves a throwaway environment. If you create a project venv, |
| 84 | +prefer `.venv/bin/pytest` over bare `pytest`. |
| 85 | + |
| 86 | +## Testing |
| 87 | + |
| 88 | +### Unit tests — `tests/test_easter.py` |
| 89 | + |
| 90 | +- `_easter_sunday` against a table of known Easter dates, including the extremes |
| 91 | + (1818 → Mar 22, the earliest possible; 1943 → Apr 25, the latest). |
| 92 | +- `EasterDateFunction.compute` over an `Int64Array` batch, asserting the result |
| 93 | + type is `date32` and values match. |
| 94 | +- Null propagation through `compute`. |
| 95 | + |
| 96 | +### SQL integration tests — `test/sql/` |
| 97 | + |
| 98 | +sqllogictest files exercised through the real DuckDB VGI extension, gated on |
| 99 | +`require-env VGI_EASTER_WORKER`: |
| 100 | + |
| 101 | +- `easter_catalog.test` — `vgi_catalogs()` discovery, `data_version_spec` |
| 102 | + (asserts `1.0.0`; `implementation_version` is the varying git SHA and is *not* |
| 103 | + asserted), `ATTACH ... (TYPE vgi)`, and `information_schema.schemata`. |
| 104 | +- `easter_function.test` — scalar calls (`easter.main.easter_date(2025)`), |
| 105 | + `typeof(...) = 'DATE'`, and `easter_date(NULL::BIGINT) IS NULL`. |
| 106 | + |
| 107 | +Run them with the DuckDB `unittest` binary built with the VGI extension, with |
| 108 | +`VGI_EASTER_WORKER` set to a worker LOCATION (stdio command or HTTP URL). |
| 109 | + |
| 110 | +## ATTACH syntax |
| 111 | + |
| 112 | +The VGI extension auto-detects transport from LOCATION: |
| 113 | + |
| 114 | +```sql |
| 115 | +-- stdio: DuckDB spawns the worker |
| 116 | +ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'uv run --python 3.13 easter_worker.py'); |
| 117 | + |
| 118 | +-- HTTP: worker running as a server (requires httpfs, which the extension auto-loads) |
| 119 | +ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'http://localhost:8000'); |
| 120 | +``` |
| 121 | + |
| 122 | +## Environment variables |
| 123 | + |
| 124 | +- `VGI_SIGNING_KEY` — stable key for state-token signing (HTTP server). |
| 125 | +- `VGI_EASTER_GIT_COMMIT` — reported as the catalog `implementation_version`. |
| 126 | +- `VGI_HTTP_PORT` (default 8000), `VGI_HTTP_HOST` — HTTP bind address. |
| 127 | +- `VGI_WORKER_DEBUG=1` — debug logging. |
| 128 | + |
| 129 | +## Conventions |
| 130 | + |
| 131 | +- Keep the worker self-contained and dependency-light — the value of this repo is |
| 132 | + being a *minimal* example. Don't add network calls or external data sources. |
| 133 | +- The catalog name (`easter`), schema (`main`), and function name (`easter_date`) |
| 134 | + are part of the public surface the SQL tests assert against; changing any of |
| 135 | + them means updating both `easter_worker.py` and `test/sql/`. |
| 136 | +- Bump `DATA_VERSION` when the function's observable output semantics change; |
| 137 | + `easter_catalog.test` asserts the current value. |
0 commit comments