Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,19 @@ jobs:
GH_TOKEN: ${{ github.token }}

integration:
name: End-to-end SQL on ${{ matrix.os }}
name: SQL E2E (${{ matrix.transport }}) on ${{ matrix.os }}
needs: resolve-haybarn
strategy:
fail-fast: false
matrix:
# Run the SAME sqllogictest suite over every VGI transport. The vgi
# extension picks the transport from the ATTACH LOCATION string that
# run-integration.sh builds per $TRANSPORT:
# subprocess : `.venv/bin/python rerank_worker.py` (stdio; extension spawns it)
# http : `http://127.0.0.1:<port>` (worker booted with --http)
# unix : `unix:///tmp/rerank.sock` (worker booted with --unix)
os: [ubuntu-latest, macos-latest]
transport: [subprocess, http, unix]
include:
- { os: ubuntu-latest, asset: haybarn_unittest-linux-amd64.zip }
- { os: macos-latest, asset: haybarn_unittest-osx-arm64.zip }
Expand All @@ -118,8 +126,10 @@ jobs:
- name: Set up Python 3.13
run: uv python install 3.13

- name: Install the worker (from the lockfile)
run: uv sync --frozen --python 3.13
- name: Install the worker (from the lockfile, with the http extra)
# The `http` extra pulls in waitress so the worker can serve `--http`.
# Harmless for the subprocess/unix legs; required for the http leg.
run: uv sync --frozen --python 3.13 --extra http

# Pre-warm the fastembed/ONNX model cache so the first ATTACH+score isn't
# paying the ~80 MB download inline.
Expand Down Expand Up @@ -147,7 +157,9 @@ jobs:
UNITTEST="$PWD/$(find hb -name 'haybarn-unittest' -type f | head -1)"
chmod +x "$UNITTEST"
echo "HAYBARN_UNITTEST=$UNITTEST" >> "$GITHUB_ENV"
echo "VGI_RERANK_WORKER=$PWD/.venv/bin/python $PWD/rerank_worker.py" >> "$GITHUB_ENV"
echo "WORKER_CMD=$PWD/.venv/bin/python $PWD/rerank_worker.py" >> "$GITHUB_ENV"

- name: Run extension integration suite
- name: Run extension integration suite (${{ matrix.transport }})
run: ci/run-integration.sh
env:
TRANSPORT: ${{ matrix.transport }}
100 changes: 86 additions & 14 deletions ci/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CI: the vgi-calendar worker integration suite
# CI: the vgi-rerank worker integration suite

[`.github/workflows/ci.yml`](../.github/workflows/ci.yml) runs the unit tests
and this repo's sqllogictest suite (`test/sql/*.test`) against the vgi-calendar
and this repo's sqllogictest suite (`test/sql/*.test`) against the vgi-rerank
VGI worker through the **real DuckDB `vgi` extension** on every push / PR.

## How it works (no C++ build)
Expand All @@ -11,9 +11,9 @@ Rather than building the vgi DuckDB extension from source, CI drives a
runner, published in Haybarn's releases) and installs the **signed** `vgi`
extension from the Haybarn community channel:

1. **Install the worker** — `uv sync --frozen` into a venv. `calendar_worker.py`
is a self-contained PEP 723 stdio worker the extension can spawn via
`uv run calendar_worker.py`.
1. **Install the worker** — `uv sync --frozen --extra http` into a venv.
`rerank_worker.py` is a self-contained PEP 723 worker the extension can spawn
via stdio, or that the harness can boot over HTTP / an AF_UNIX socket.
2. **Download the runner** — the matching `haybarn_unittest-*` asset per
platform from the latest Haybarn release.
3. **Preprocess** — the standalone runner links none of the extensions the
Expand All @@ -24,20 +24,92 @@ extension from the Haybarn community channel:
`INSTALL vgi FROM community;` right before each bare `LOAD vgi;`. `require-env`
and everything else pass through untouched.
4. **Run** — [`run-integration.sh`](run-integration.sh) stages the preprocessed
tree, points `VGI_CALENDAR_WORKER` at `uv run calendar_worker.py`, warms the
extension cache once, then runs the suite in a single `haybarn-unittest`
invocation. Any failed assertion exits non-zero and fails the job.
tree, resolves `VGI_RERANK_WORKER` (the ATTACH `LOCATION`) per the
`$TRANSPORT` it's run with (see below), warms the extension cache once, then
runs the suite in a single `haybarn-unittest` invocation. Any failed
assertion exits non-zero and fails the job.

## Transport matrix (subprocess | http | unix)

The same `test/sql/*.test` suite is run over all three VGI transports — the
extension picks the transport from the `LOCATION` string the `.test` files
`ATTACH`, and `run-integration.sh` builds that string from `$TRANSPORT`:

| `TRANSPORT` | `VGI_RERANK_WORKER` (LOCATION) | How the worker is reached |
|--------------|---------------------------------------|---------------------------|
| `subprocess` | `.venv/bin/python rerank_worker.py` | extension spawns the worker per query; Arrow IPC over stdin/stdout (default) |
| `http` | `http://127.0.0.1:<port>` | harness boots `rerank_worker.py --http --port 0 --port-file <f>`, waits for the port-file, then ATTACHes that URL |
| `unix` | `unix:///tmp/rerank-<pid>.sock` | harness boots `rerank_worker.py --unix <sock>`, waits for the socket to appear, then ATTACHes it |

The CI `integration` job is a `transport: [subprocess, http, unix]` × `os`
matrix; each leg runs `ci/run-integration.sh` with `TRANSPORT=<t>`. Run a single
transport locally with e.g. `TRANSPORT=http ci/run-integration.sh`.

### Port / readiness discovery

- **http**: the worker writes its auto-selected port to `--port-file`
atomically (tmp + rename), so the harness watches for that file to appear and
reads the port from it — it does **not** parse stdout. Boot line:
`rerank_worker.py --http --port 0 --port-file <f>`.
- **unix**: the worker binds the AF_UNIX socket and prints `UNIX:<abs-path>`;
the harness polls for the socket file (`test -S`) to appear. Boot line:
`rerank_worker.py --unix <sock>`.

Both out-of-band server processes are booted with cwd = the repo root (so they
resolve the worker's relative resources / model cache) and are trap-killed on
exit.

### HTTP transport needs the `httpfs` extension (resolved, not gated)

The vgi extension implements HTTP transport on top of DuckDB's **httpfs**
extension, so an `http://` ATTACH binds with

> `Binder Error: VGI HTTP transport requires the httpfs extension. Install it with: INSTALL httpfs; LOAD httpfs;`

unless httpfs is loaded first. This is a **dependency**, not a protocol
limitation, so we resolve it rather than gate: the http leg of
`run-integration.sh` injects a signed `INSTALL httpfs FROM core; LOAD httpfs;`
into each staged `.test` (right after the awk-injected `LOAD vgi;`). The
`.test` files themselves stay transport-agnostic.

The http leg also needs the worker's `http` extra (waitress): `pyproject.toml`
ships an `http` extra (`vgi-python[http]`), the PEP 723 header lists
`vgi-python[http]`, and CI runs `uv sync --frozen --extra http`.

> **Sharp edge — the runner silently SKIPs HTTP errors.** The haybarn/DuckDB
> sqllogictest runner's default skip list skips any statement whose error
> message contains `"HTTP"` or `"Unable to connect"`. Without the httpfs load,
> *every* HTTP-leg test SKIPs (the httpfs binder error contains "HTTP") and the
> suite reports "All tests were skipped" — a green-looking **fake pass**, not a
> real one. `run-integration.sh` therefore fails the leg unless the runner
> reports `All tests passed (N assertions …)` with N > 0 and reports zero
> skips.

### Per-transport status

- **subprocess**: GREEN — 21 assertions.
- **http**: GREEN — 25 assertions (21 + the injected httpfs INSTALL/LOAD
statements).
- **unix**: GREEN — 21 assertions. No extra deps; `--unix` is built into the
worker's `Worker.main()`.

The suite here is stateless scalar scoring + a static discovery table function
(`supported_models()`), so none of the inherent HTTP limitations (streaming
partition-local state, etc.) apply — nothing needed gating.

## Run it locally

```bash
uv sync --python 3.13 # install the worker + deps
uv sync --python 3.13 --extra http # install the worker + deps (http extra for the http leg)
# point HAYBARN_UNITTEST at a haybarn-unittest binary (or a local DuckDB
# `unittest` built with the vgi extension), and the worker at the stdio command:
# `unittest` built with the vgi extension). WORKER_CMD is the stdio command that
# runs the worker; the harness uses it directly for subprocess and boots it with
# --http / --unix for the other transports.
HAYBARN_UNITTEST=/path/to/haybarn-unittest \
VGI_CALENDAR_WORKER="uv run --python 3.13 calendar_worker.py" \
ci/run-integration.sh
WORKER_CMD="uv run --python 3.13 rerank_worker.py" \
TRANSPORT=subprocess ci/run-integration.sh # or TRANSPORT=http / TRANSPORT=unix
```

Or use the Makefile target `make test-sql`, which installs `haybarn-unittest`
as a uv tool and points the worker at `uv run --python 3.13 calendar_worker.py`.
`TRANSPORT` defaults to `subprocess`, and `WORKER_CMD` defaults to
`uv run --python 3.13 <repo>/rerank_worker.py`, so a bare
`HAYBARN_UNITTEST=… ci/run-integration.sh` runs the subprocess leg.
185 changes: 176 additions & 9 deletions ci/run-integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,171 @@
# VGI worker, using a prebuilt standalone `haybarn-unittest` and the signed
# community `vgi` extension — no C++ build from source. See ci/README.md.
#
# The SAME suite is exercised over three VGI transports, selected by $TRANSPORT.
# The vgi extension picks the transport from the LOCATION string the .test files
# ATTACH (`${VGI_RERANK_WORKER}`):
#
# subprocess : a bare stdio command (`.venv/bin/python rerank_worker.py`) — the
# extension spawns the worker per query and talks Arrow IPC over
# stdin/stdout. Default; current behavior.
# http : the worker is started out-of-band in `--http` mode on an auto
# port; LOCATION becomes `http://127.0.0.1:<port>`.
# unix : the worker is started out-of-band on an AF_UNIX socket;
# LOCATION becomes `unix:///path/to.sock`.
#
# Required environment:
# HAYBARN_UNITTEST path to the haybarn-unittest binary
# VGI_RERANK_WORKER worker LOCATION the .test files ATTACH (a stdio command
# such as `uv run rerank_worker.py`, or an http:// URL)
# HAYBARN_UNITTEST path to the haybarn-unittest binary
# TRANSPORT subprocess | http | unix (default: subprocess)
# WORKER_CMD the stdio command that runs the worker. Used directly
# as the LOCATION for subprocess, and as the process to
# boot the server for http/unix. Defaults to
# `uv run --python 3.13 <repo>/rerank_worker.py`.
# Optional:
# STAGE scratch dir for the preprocessed test tree (default: mktemp)
# STAGE scratch dir for the preprocessed test tree (default: mktemp)
set -euo pipefail

: "${HAYBARN_UNITTEST:?path to the haybarn-unittest binary}"
: "${VGI_RERANK_WORKER:?worker LOCATION (stdio command or http:// URL)}"

HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/.." && pwd)"
STAGE="${STAGE:-$(mktemp -d)}"
TRANSPORT="${TRANSPORT:-subprocess}"
WORKER_CMD="${WORKER_CMD:-uv run --python 3.13 $REPO/rerank_worker.py}"

echo "Staging preprocessed tests into $STAGE ..."
mkdir -p "$STAGE/test/sql"
for f in "$REPO"/test/sql/*.test; do
awk -f "$HERE/preprocess-require.awk" "$f" > "$STAGE/test/sql/$(basename "$f")"
done

# ---------------------------------------------------------------------------
# Per-transport: resolve VGI_RERANK_WORKER (the LOCATION) and, for the
# out-of-band transports, boot the worker server + arrange trap-cleanup.
# ---------------------------------------------------------------------------
SERVER_PID=""
SOCK=""
PORT_FILE=""

cleanup() {
# Capture the real exit status FIRST: an EXIT trap whose last command returns
# non-zero (e.g. a short-circuited `[[ -n "" ]] && …` on the subprocess/unix
# legs where nothing needs cleaning) would otherwise become the script's exit
# status under `set -e` and fail an already-passing run.
local rc=$?
if [[ -n "$SERVER_PID" ]]; then
kill "$SERVER_PID" 2>/dev/null || true
wait "$SERVER_PID" 2>/dev/null || true
fi
if [[ -n "$SOCK" ]]; then rm -f "$SOCK"; fi
if [[ -n "$PORT_FILE" ]]; then rm -f "$PORT_FILE"; fi
return "$rc"
}
trap cleanup EXIT

case "$TRANSPORT" in
subprocess)
export VGI_RERANK_WORKER="$WORKER_CMD"
;;

http)
# The vgi extension's HTTP transport is implemented on top of DuckDB's
# httpfs extension, so an `http://` ATTACH binds with
# "Binder Error: VGI HTTP transport requires the httpfs extension."
# unless httpfs is loaded first. (The haybarn sqllogictest runner's default
# skip list swallows any error containing "HTTP", so without this the whole
# suite would silently SKIP rather than fail — a fake pass.) The .test files
# are transport-agnostic; inject a signed `INSTALL httpfs FROM core; LOAD
# httpfs;` right after the awk-injected `LOAD vgi;` in each staged file, so
# httpfs is present only when we actually run over HTTP.
echo "Injecting httpfs load into staged tests (HTTP transport needs it) ..."
for sf in "$STAGE"/test/sql/*.test; do
awk '
{ print }
/^LOAD[ \t]+vgi[ \t]*;[ \t]*$/ && !done {
print "";
print "statement ok";
print "INSTALL httpfs FROM core;";
print "";
print "statement ok";
print "LOAD httpfs;";
done = 1
}
' "$sf" > "$sf.tmp" && mv "$sf.tmp" "$sf"
done

# Boot the worker in HTTP mode on an auto-selected port. The worker writes
# the chosen port to --port-file atomically (tmp + rename), so we watch for
# the file to appear rather than parsing stdout. HTTP mode needs the `http`
# extra (waitress); WORKER_CMD must resolve it — CI runs
# `uv sync --extra http` and the PEP 723 header lists `vgi-python[http]`.
PORT_FILE="$(mktemp -u "${TMPDIR:-/tmp}/rerank-port.XXXXXX")"
LOG_FILE="${TMPDIR:-/tmp}/rerank-http-server.log"
echo "Starting HTTP worker: $WORKER_CMD --http --port 0 --port-file $PORT_FILE"
# shellcheck disable=SC2086
( cd "$REPO" && exec $WORKER_CMD --http --port 0 --port-file "$PORT_FILE" ) > "$LOG_FILE" 2>&1 &
SERVER_PID=$!

PORT=""
for _ in $(seq 1 240); do
if ! kill -0 "$SERVER_PID" 2>/dev/null; then
echo "ERROR: HTTP worker exited before reporting a port. Log:" >&2
cat "$LOG_FILE" >&2
exit 1
fi
if [[ -s "$PORT_FILE" ]]; then
PORT="$(tr -d '[:space:]' < "$PORT_FILE")"
[[ -n "$PORT" ]] && break
fi
sleep 0.5
done
if [[ -z "$PORT" ]]; then
echo "ERROR: timed out waiting for HTTP worker port-file. Log:" >&2
cat "$LOG_FILE" >&2
exit 1
fi
echo "HTTP worker ready on port $PORT (pid $SERVER_PID)"
export VGI_RERANK_WORKER="http://127.0.0.1:$PORT"
;;

unix)
# Boot the worker bound to an AF_UNIX socket. The worker prints
# `UNIX:<abs-path>` once bound; we poll for the socket file to appear.
SOCK="${TMPDIR:-/tmp}/rerank-$$.sock"
rm -f "$SOCK"
LOG_FILE="${TMPDIR:-/tmp}/rerank-unix-server.log"
echo "Starting unix worker: $WORKER_CMD --unix $SOCK"
# shellcheck disable=SC2086
( cd "$REPO" && exec $WORKER_CMD --unix "$SOCK" ) > "$LOG_FILE" 2>&1 &
SERVER_PID=$!

READY=""
for _ in $(seq 1 240); do
if ! kill -0 "$SERVER_PID" 2>/dev/null; then
echo "ERROR: unix worker exited before binding the socket. Log:" >&2
cat "$LOG_FILE" >&2
exit 1
fi
if [[ -S "$SOCK" ]]; then
READY=1
break
fi
sleep 0.5
done
if [[ -z "$READY" ]]; then
echo "ERROR: timed out waiting for unix worker socket. Log:" >&2
cat "$LOG_FILE" >&2
exit 1
fi
echo "unix worker ready on $SOCK (pid $SERVER_PID)"
export VGI_RERANK_WORKER="unix://$SOCK"
;;

*)
echo "ERROR: unknown TRANSPORT '$TRANSPORT' (want subprocess|http|unix)" >&2
exit 2
;;
esac

cd "$STAGE"

# Warm the extension cache once: vgi from the signed community channel. A miss
Expand All @@ -42,7 +186,30 @@ EOF
"$HAYBARN_UNITTEST" "test/_warm.test" >/dev/null 2>&1 || echo "::warning::extension warm step did not fully succeed"
rm -f "$STAGE/test/_warm.test"

# Run the whole suite in one invocation, streaming the runner's native
# sqllogictest report. Any failed assertion exits non-zero and fails the job.
echo "Running suite (worker: $VGI_RERANK_WORKER) ..."
"$HAYBARN_UNITTEST" "test/sql/*"
# Run the whole suite in one invocation, capturing the runner's native
# sqllogictest report so we can both stream it AND assert on the summary line.
echo "Running suite (transport: $TRANSPORT, worker: $VGI_RERANK_WORKER) ..."
RUN_LOG="$STAGE/run.log"
set +e
"$HAYBARN_UNITTEST" "test/sql/*" 2>&1 | tee "$RUN_LOG"
status=${PIPESTATUS[0]}
set -e

# SILENT-SKIP GUARD (critical for the http leg). DuckDB's sqllogictest runner
# auto-SKIPS (exit 0!) any test whose error message contains "HTTP" or "Unable
# to connect" — so a broken http setup reports "All tests were skipped" and the
# job goes GREEN while testing nothing. Fail the leg unless the runner reports a
# real pass with N>0 assertions and reported zero skips.
if [[ $status -ne 0 ]]; then
echo "ERROR: haybarn-unittest exited $status" >&2
exit "$status"
fi
if grep -Eqi 'were skipped|tests were skipped' "$RUN_LOG"; then
echo "ERROR: tests were SKIPPED (likely a masked $TRANSPORT transport error — see above)." >&2
exit 1
fi
if ! grep -Eq 'All tests passed \([1-9][0-9]* assertions' "$RUN_LOG"; then
echo "ERROR: did not find an 'All tests passed (N assertions ...)' summary with N>0." >&2
exit 1
fi
echo "Suite GREEN over transport: $TRANSPORT"
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ dev = [
"mypy>=1.10",
"pydoclint>=0.5",
]
# HTTP transport: the worker can serve over HTTP (`rerank_worker.py --http`) in
# addition to stdio/AF_UNIX. That path needs vgi-python's `http` extra
# (waitress). CI's http transport leg installs this via `uv sync --extra http`.
http = [
"vgi-python[http]>=0.8.3",
]

[project.urls]
Homepage = "https://query.farm"
Expand Down
Loading
Loading