Skip to content

Commit fabe507

Browse files
rustyconoverclaude
andcommitted
Add README.md and CLAUDE.md
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 0c1e59f commit fabe507

2 files changed

Lines changed: 270 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
A minimal **VGI (Vector Gateway Interface)** worker that exposes one SQL scalar
8+
function, `easter_date(year)`, returning the Western (Gregorian) Easter Sunday
9+
date for a year. It is intentionally small — a clean reference example of a VGI
10+
scalar function. The computation is pure standard-library arithmetic (the
11+
Anonymous Gregorian Computus); there are no network calls or external data.
12+
13+
VGI lets a worker publish catalogs/schemas/functions that DuckDB can `ATTACH`
14+
and query natively, exchanging values as Apache Arrow IPC.
15+
16+
## Architecture
17+
18+
Effectively everything lives in two files:
19+
20+
- **`easter_worker.py`** (~160 lines) — the whole worker:
21+
- `_easter_sunday(year)` — the Computus, returns `datetime.date`.
22+
- `EasterDateFunction(ScalarFunction)` — maps `pa.Int64Array` years to a
23+
`pa.date32()` array. Output type is set explicitly via
24+
`Returns(arrow_type=pa.date32())`; nulls propagate. `Meta` carries the
25+
function name (`easter_date`), description, and `FunctionExample`s used for
26+
catalog introspection.
27+
- `_EASTER_CATALOG``Catalog(name="easter")` with a single `main` schema
28+
holding `EasterDateFunction`.
29+
- `EasterCatalog(ReadOnlyCatalogInterface)` — advertises `data_version`
30+
(`DATA_VERSION = "1.0.0"`) and `implementation_version` (`GIT_COMMIT`, from
31+
`VGI_EASTER_GIT_COMMIT`, else `"unknown"`).
32+
- `EasterWorker(Worker)` — binds the catalog + interface. `main()` runs stdio
33+
mode; `main_http()` runs the HTTP server.
34+
- **`serve.py`** — three lines: imports `EasterWorker` and calls `main_http()`.
35+
This is the HTTP entrypoint.
36+
- **`conftest.py`** — puts the repo root on `sys.path` so tests can
37+
`import easter_worker` / `import serve`.
38+
39+
### The scalar-function pattern
40+
41+
A VGI scalar function subclasses `ScalarFunction` and implements a `compute`
42+
classmethod whose params/return are annotated:
43+
44+
- inputs: `Annotated[pa.Int64Array, Param(doc=...)]` — Arrow array per argument.
45+
- output: `Annotated[pa.Array[Any], Returns(arrow_type=pa.date32())]` — when the
46+
result type isn't the natural inference of the input, set it explicitly with
47+
`Returns(arrow_type=...)`.
48+
- null handling is manual: `compute` iterates `year.to_pylist()` and maps
49+
`None -> None`.
50+
51+
## Dependencies & Python version
52+
53+
Requires **Python 3.13+**, managed with `uv`. Deps are declared inline as
54+
PEP 723 script metadata in `easter_worker.py` and `serve.py`:
55+
56+
```python
57+
# dependencies = ["vgi[http,oauth]", "vgi-rpc[sentry]"]
58+
# [tool.uv.sources]
59+
# vgi = { path = "../vgi-python" }
60+
# vgi-rpc = { path = "../vgi-rpc" }
61+
```
62+
63+
In development, `vgi` and `vgi-rpc` resolve against the sibling checkouts
64+
`~/Development/vgi-python` and `~/Development/vgi-rpc`.
65+
66+
## Commands
67+
68+
```bash
69+
# Run the worker in stdio mode (DuckDB spawns it as a subprocess)
70+
uv run --python 3.13 easter_worker.py
71+
72+
# Run the HTTP server
73+
VGI_SIGNING_KEY=dev uv run --python 3.13 serve.py --host 0.0.0.0 --port 8000
74+
75+
# Unit tests (pytest). The --rootdir/-o flags stop pytest from picking up an
76+
# upstream pyproject that injects --mypy --ruff.
77+
uv run --python 3.13 \
78+
--with pytest --with pyarrow --with ../vgi-python --with ../vgi-rpc \
79+
pytest tests/ --rootdir=. -o "addopts=" -q
80+
```
81+
82+
There is no `.venv` checked in (it's gitignored); the `uv run --with ...`
83+
invocation above resolves a throwaway environment. If you create a project venv,
84+
prefer `.venv/bin/pytest` over bare `pytest`.
85+
86+
## Testing
87+
88+
### Unit tests — `tests/test_easter.py`
89+
90+
- `_easter_sunday` against a table of known Easter dates, including the extremes
91+
(1818 → Mar 22, the earliest possible; 1943 → Apr 25, the latest).
92+
- `EasterDateFunction.compute` over an `Int64Array` batch, asserting the result
93+
type is `date32` and values match.
94+
- Null propagation through `compute`.
95+
96+
### SQL integration tests — `test/sql/`
97+
98+
sqllogictest files exercised through the real DuckDB VGI extension, gated on
99+
`require-env VGI_EASTER_WORKER`:
100+
101+
- `easter_catalog.test``vgi_catalogs()` discovery, `data_version_spec`
102+
(asserts `1.0.0`; `implementation_version` is the varying git SHA and is *not*
103+
asserted), `ATTACH ... (TYPE vgi)`, and `information_schema.schemata`.
104+
- `easter_function.test` — scalar calls (`easter.main.easter_date(2025)`),
105+
`typeof(...) = 'DATE'`, and `easter_date(NULL::BIGINT) IS NULL`.
106+
107+
Run them with the DuckDB `unittest` binary built with the VGI extension, with
108+
`VGI_EASTER_WORKER` set to a worker LOCATION (stdio command or HTTP URL).
109+
110+
## ATTACH syntax
111+
112+
The VGI extension auto-detects transport from LOCATION:
113+
114+
```sql
115+
-- stdio: DuckDB spawns the worker
116+
ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'uv run --python 3.13 easter_worker.py');
117+
118+
-- HTTP: worker running as a server (requires httpfs, which the extension auto-loads)
119+
ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'http://localhost:8000');
120+
```
121+
122+
## Environment variables
123+
124+
- `VGI_SIGNING_KEY` — stable key for state-token signing (HTTP server).
125+
- `VGI_EASTER_GIT_COMMIT` — reported as the catalog `implementation_version`.
126+
- `VGI_HTTP_PORT` (default 8000), `VGI_HTTP_HOST` — HTTP bind address.
127+
- `VGI_WORKER_DEBUG=1` — debug logging.
128+
129+
## Conventions
130+
131+
- Keep the worker self-contained and dependency-light — the value of this repo is
132+
being a *minimal* example. Don't add network calls or external data sources.
133+
- The catalog name (`easter`), schema (`main`), and function name (`easter_date`)
134+
are part of the public surface the SQL tests assert against; changing any of
135+
them means updating both `easter_worker.py` and `test/sql/`.
136+
- Bump `DATA_VERSION` when the function's observable output semantics change;
137+
`easter_catalog.test` asserts the current value.

README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# vgi-easter
2+
3+
A minimal [VGI (Vector Gateway Interface)](https://github.com/Query-farm) worker
4+
that computes the date of **Western (Gregorian) Easter Sunday** for a given year
5+
and exposes it to DuckDB as a SQL scalar function.
6+
7+
```sql
8+
ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'uv run --python 3.13 easter_worker.py');
9+
10+
SELECT easter_date(2025);
11+
-- 2025-04-20
12+
13+
SELECT year, easter_date(year) AS easter
14+
FROM range(2020, 2031) t(year);
15+
-- 2020 2020-04-12
16+
-- 2021 2021-04-04
17+
-- ...
18+
```
19+
20+
The function `easter_date(year)` takes a `BIGINT` year and returns a `DATE`,
21+
computed with the Anonymous Gregorian algorithm (the Meeus/Jones/Butcher
22+
*Computus*). It is pure standard-library arithmetic — no network calls, no
23+
external data — which makes this repo a clean, self-contained example of a VGI
24+
scalar-function worker.
25+
26+
## How it works
27+
28+
VGI lets a worker process publish catalogs, schemas, and functions that DuckDB
29+
can `ATTACH` and query as if they were native. Data crosses the boundary as
30+
Apache Arrow IPC, so values stay columnar end to end.
31+
32+
This worker publishes one catalog:
33+
34+
```
35+
easter (catalog, data version 1.0.0)
36+
└── main (schema)
37+
└── easter_date(year BIGINT) -> DATE
38+
```
39+
40+
`year` propagates nulls — a null year yields a null date.
41+
42+
The entire implementation lives in [`easter_worker.py`](easter_worker.py):
43+
44+
- `_easter_sunday(year)` — the Computus, returning a `datetime.date`.
45+
- `EasterDateFunction` — a `ScalarFunction` mapping an `Int64Array` of years to
46+
a `date32` array, with metadata and SQL examples for catalog introspection.
47+
- `EasterCatalog` / `EasterWorker` — wire the function into a VGI catalog that
48+
advertises a stable `data_version` (`1.0.0`) and a git-SHA
49+
`implementation_version` (from `VGI_EASTER_GIT_COMMIT`).
50+
51+
## Requirements
52+
53+
- Python **3.13+**
54+
- [`uv`](https://docs.astral.sh/uv/) for dependency management
55+
56+
Dependencies are declared inline as [PEP 723](https://peps.python.org/pep-0723/)
57+
script metadata in `easter_worker.py` and `serve.py`. During development they
58+
resolve against the sibling checkouts `../vgi-python` and `../vgi-rpc`.
59+
60+
## Running
61+
62+
The worker supports both VGI transports.
63+
64+
### stdio (DuckDB spawns the worker)
65+
66+
DuckDB runs the worker as a subprocess and talks to it over stdin/stdout. No
67+
server to manage:
68+
69+
```sql
70+
ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'uv run --python 3.13 easter_worker.py');
71+
SELECT easter_date(2025);
72+
DETACH easter;
73+
```
74+
75+
### HTTP
76+
77+
Start the worker as an HTTP server (`serve.py` calls `EasterWorker.main_http()`):
78+
79+
```bash
80+
VGI_SIGNING_KEY=dev uv run --python 3.13 serve.py --host 0.0.0.0 --port 8000
81+
```
82+
83+
Then attach over HTTP (the VGI extension auto-loads `httpfs`):
84+
85+
```sql
86+
ATTACH 'easter' AS easter (TYPE vgi, LOCATION 'http://localhost:8000');
87+
SELECT easter_date(2025);
88+
```
89+
90+
## Testing
91+
92+
### Unit tests (pytest)
93+
94+
The `tests/` suite checks the Computus against known Easter dates (including the
95+
March 22 / April 25 extremes) and the Arrow `compute()` path including null
96+
propagation:
97+
98+
```bash
99+
uv run --python 3.13 \
100+
--with pytest --with pyarrow --with ../vgi-python --with ../vgi-rpc \
101+
pytest tests/ --rootdir=. -o "addopts=" -q
102+
```
103+
104+
The `--rootdir=. -o "addopts="` flags keep pytest from picking up an upstream
105+
`pyproject.toml` that injects `--mypy --ruff`. `conftest.py` puts the repo root
106+
on `sys.path` so the tests can `import easter_worker`.
107+
108+
### SQL integration tests (sqllogictest)
109+
110+
`test/sql/` contains [sqllogictest](https://duckdb.org/dev/sqllogictest/intro)
111+
files that run against the worker through the real DuckDB VGI extension:
112+
113+
- `easter_catalog.test` — catalog discovery, `data_version_spec`, `ATTACH` and
114+
schema introspection.
115+
- `easter_function.test` — scalar evaluation, the `DATE` result type, and null
116+
propagation.
117+
118+
Both are gated on `require-env VGI_EASTER_WORKER`, so point that at a worker
119+
LOCATION (a stdio command or an HTTP URL) and run them with the DuckDB
120+
`unittest` binary built with the VGI extension.
121+
122+
## Environment variables
123+
124+
| Variable | Purpose |
125+
| ------------------------- | ------------------------------------------------------------------- |
126+
| `VGI_SIGNING_KEY` | Stable key for state-token signing (required for the HTTP server). |
127+
| `VGI_EASTER_GIT_COMMIT` | Git SHA reported as the catalog's `implementation_version`. |
128+
| `VGI_HTTP_PORT` / `VGI_HTTP_HOST` | HTTP bind address (defaults: `8000` / all interfaces). |
129+
| `VGI_WORKER_DEBUG` | Set to `1` for debug logging. |
130+
131+
## License
132+
133+
Copyright © Query.Farm. All rights reserved.

0 commit comments

Comments
 (0)