GSO Scaffolds

Minimal infra and examples for running AI coding agents on the GSO benchmark, and producing GSO-compatible predictions for evaluation. To install, run:

uv pip install -e .

# This installs two CLI entrypoints:
# `gso-harbor` → `harbor/`
# `gso-openhands` → `openhands_gso/`

GSO Summary

All scaffolds are expected to output a GSO-compatible JSONL with (at minimum):

instance_id
model_patch: a git diff (patch to apply)
model_name_or_path: any identifier string

Then, the GSO eval harness can evaluate the predictions (testing in dockerized environments, scoring, etc.) as shown in the official GSO codebase. Please refer to the documentation there for more details on how to run the eval harness.

Scaffolds Overview

This repo currently includes two practical integrations/examples:

harbor/:
- convert GSO tasks into Harbor tasks
- run any Harbor-compatible agent (e.g. Codex, Claude Code, OpenHands, etc.)
- automatically runs the GSO eval harness in-container on the agent's patches and export GSO predictions and evaluation results
openhands_gso/:
- run any version of the OpenHands engine andexport GSO predictions
- manually run the official GSO eval harness on the agent's generated patches

Scaffold Details

Harbor

Use this when you want to run agents through Harbor (e.g. Codex, Claude Code, OpenHands, etc.) by converting GSO instances into Harbor tasks, then exporting Harbor job results back into a GSO predictions.jsonl.

Convert GSO → Harbor tasks:

gso-harbor convert --dataset gso-bench/gso --output ./harbor-tasks/

Run with Harbor (example: oracle for validation):

harbor run --agent oracle --path ./harbor-tasks/<task-name> -n 1

Export Harbor results → GSO predictions:

gso-harbor export-results --harbor-results ./jobs/<job-name> --output ./predictions/

For more on the Harbor task layout, see harbor/README.md.

OpenHands

Use this when you want to run OpenHands directly, but still be able to pin any OpenHands version via Git tags/releases. For instance, you may want to fork OpenHands and add custom features. This should also prove as a useful example for other custom agents.

Example: pin OpenHands v1.3.0

uv run \
  --with "openhands-ai @ git+https://github.com/All-Hands-AI/OpenHands.git@v1.3.0" \
  --project . \
  gso-openhands --help

More usage details for OpenHands(including config.toml), see openhands_gso/README.md.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
harbor		harbor
openhands_gso		openhands_gso
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSO Scaffolds

GSO Summary

Scaffolds Overview

Scaffold Details

Harbor

OpenHands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GSO Scaffolds

GSO Summary

Scaffolds Overview

Scaffold Details

Harbor

OpenHands

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages