-
Notifications
You must be signed in to change notification settings - Fork 0
Add self-training pipeline, demos, smoke tests, benchmarks, and production-grade tooling #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3b84411
caa9a08
f29fcc2
05a0025
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,50 @@ | ||||||||
| name: Smoke Tests | ||||||||
|
|
||||||||
| on: | ||||||||
| push: | ||||||||
| branches: ["**"] | ||||||||
| pull_request: | ||||||||
| branches: ["**"] | ||||||||
|
|
||||||||
| jobs: | ||||||||
| smoke-tests: | ||||||||
| name: Smoke Tests (Python ${{ matrix.python-version }}) | ||||||||
| runs-on: ubuntu-latest | ||||||||
| permissions: | ||||||||
| contents: read | ||||||||
| strategy: | ||||||||
| matrix: | ||||||||
| python-version: ["3.10", "3.11"] | ||||||||
|
|
||||||||
| steps: | ||||||||
| - name: Checkout repository | ||||||||
| uses: actions/checkout@v4 | ||||||||
|
|
||||||||
| - name: Set up Python ${{ matrix.python-version }} | ||||||||
| uses: actions/setup-python@v5 | ||||||||
| with: | ||||||||
| python-version: ${{ matrix.python-version }} | ||||||||
| cache: pip | ||||||||
|
|
||||||||
| - name: Install dependencies | ||||||||
| run: | | ||||||||
| python -m pip install --upgrade pip | ||||||||
| pip install pytest pytest-cov pytest-timeout pyyaml | ||||||||
| # Install lightweight subset of requirements (skip heavy GPU libs). | ||||||||
| pip install numpy tqdm || true | ||||||||
|
|
||||||||
| - name: Run smoke tests | ||||||||
| run: | | ||||||||
| python -m pytest tests/test_smoke.py -v --tb=short --timeout=120 | ||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The smoke test command uses Useful? React with 👍 / 👎. |
||||||||
|
|
||||||||
| - name: Run legacy toolkit tests | ||||||||
| run: | | ||||||||
|
||||||||
| run: | | |
| run: | | |
| set -o pipefail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve unittest exit status in smoke workflow
The Run legacy toolkit tests step pipes python -m unittest ... into tail, so the step exits with tail's status instead of the test runner's status. In GitHub Actions' default shell settings, this allows failing toolkit tests to appear green, which undermines CI gating for regressions in test_godmode_toolkit.
Useful? React with 👍 / 👎.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,46 @@ | ||||||
| .PHONY: install install-dev test smoke lint format benchmark demo clean help | ||||||
|
|
||||||
| PYTHON ?= python3 | ||||||
| PIP ?= pip | ||||||
|
|
||||||
| help: | ||||||
| @echo "Victor LLM – Makefile targets" | ||||||
| @echo "" | ||||||
| @echo " install Install runtime dependencies" | ||||||
| @echo " install-dev Install dev/test dependencies" | ||||||
| @echo " test Run all tests (smoke + toolkit)" | ||||||
| @echo " smoke Run only smoke tests (fast)" | ||||||
| @echo " lint Lint with ruff" | ||||||
| @echo " format Auto-format with ruff" | ||||||
| @echo " benchmark Run inference benchmark (5 prompts)" | ||||||
| @echo " demo Run the end-to-end demo" | ||||||
| @echo " clean Remove generated artifacts" | ||||||
|
|
||||||
| install: | ||||||
| $(PIP) install -r requirements.txt | ||||||
|
|
||||||
| install-dev: install | ||||||
| $(PIP) install pytest pytest-cov pytest-timeout ruff pyyaml | ||||||
|
|
||||||
| test: smoke | ||||||
| $(PYTHON) -m unittest test_godmode_toolkit -v 2>&1 | tail -5 | ||||||
|
||||||
| $(PYTHON) -m unittest test_godmode_toolkit -v 2>&1 | tail -5 | |
| $(PYTHON) -m unittest test_godmode_toolkit -v |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Victor LLM Benchmarks | ||
|
|
||
| This directory contains benchmarking infrastructure for Victor LLM. | ||
|
|
||
| ## Structure | ||
|
|
||
| ``` | ||
| benchmarks/ | ||
| harness.py ← standalone benchmarking harness (latency, throughput, memory) | ||
| results/ ← JSON results from past benchmark runs (auto-created) | ||
| README.md ← this file | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ```bash | ||
| # Run inference benchmark (no checkpoint needed) | ||
| python benchmarks/harness.py | ||
|
|
||
| # Run with a trained checkpoint | ||
| python benchmarks/harness.py --checkpoint runs/<run_id> | ||
|
|
||
| # Use the victor CLI | ||
| victor benchmark --prompts 20 --max-tokens 128 | ||
| ``` | ||
|
|
||
| ## Metrics Captured | ||
|
|
||
| | Metric | Description | | ||
| |--------|-------------| | ||
| | `latency_mean_s` | Mean per-prompt inference time (seconds) | | ||
| | `latency_median_s` | Median per-prompt inference time | | ||
| | `latency_min_s` / `latency_max_s` | Min / max latency | | ||
| | `latency_stdev_s` | Standard deviation of latency | | ||
| | `throughput_tokens_per_s` | Total tokens generated ÷ total time | | ||
| | `memory_before_mb` | RSS before benchmark (MB) | | ||
| | `memory_after_mb` | RSS after benchmark (MB) | | ||
| | `memory_delta_mb` | Memory growth during benchmark | | ||
|
|
||
| ## Comparing Runs | ||
|
|
||
| Results are stored as timestamped JSON files in `benchmarks/results/`. | ||
| Use the compare helper: | ||
|
|
||
| ```bash | ||
| python benchmarks/harness.py --compare benchmarks/results/ | ||
| ``` | ||
|
|
||
| ## Adding a Training Benchmark | ||
|
|
||
| ```bash | ||
| python benchmarks/harness.py --mode training --dataset datasets/example_dataset --epochs 1 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pip install numpy tqdm || truewill hide install failures and lead to confusing downstream import/runtime errors. If these dependencies are required for the smoke tests, let the step fail; if they’re optional, gate the tests that need them accordingly.