[PB] Packaging fixes to various lib to enable non-editable installing by stalkermustang · Pull Request #82 · openai/frontier-evals

stalkermustang · 2025-10-26T11:40:40Z

Hello OpenAI team,

I'm currently working on adapting PaperBench for the PrimeIntellect Environments hub (PR). This initiative closely resembles the previously reviewed MLE Bench Port—many thanks to @thesofakillers for an extensive review there!

I'm facing an issue while trying to reuse the existing Judge code from PaperBench, as I'd prefer adhering to the reference implementation rather than reinventing the wheel. To achieve this, I introduced PaperBench as a dependency and attempted to import SimpleJudge as follows:

from paperbench.judge.simple import SimpleJudge

However, this raises a couple of dependency-related issues:

alcatraz: The import chain eventually requires accessing JudgeOutput that leads to:

from alcatraz.clusters.local import BaseAlcatrazCluster, ClusterConfig, LocalConfig

SimpleJudge: Internally attempts to import:

from nanoeval.solvers.computer_tasks.code_execution_interface import ComputerInterface

How to reproduce in a clean environment (5 min):

mkdir <some_name> & cd there & uv init --name frontier_evals_fix
Update pyproject.toml with this:

[project]
name = "frontier-evals-fix"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "paperbench",
]

[tool.uv.sources]
paperbench = {git = "https://github.com/openai/frontier-evals.git", subdirectory = "project/paperbench", rev = "650e03aede9b5d5a81e9045bbc9843dd665d8fe4"}

uv sync runs fine
Update main.py with this:

def main():
    from paperbench.judge.simple import SimpleJudge
    print('Import is fine, ', SimpleJudge.__name__)

if __name__ == "__main__":
    main()

uv run main.py will show the issue:

Traceback (most recent call last):
  File "/Users/seeall/Documents/_WORKSPACES/tmp_fr_ev/main.py", line 6, in <module>
    main()
  File "/Users/seeall/Documents/_WORKSPACES/tmp_fr_ev/main.py", line 2, in main
    from paperbench.judge.simple import SimpleJudge
  File "/Users/seeall/Documents/_WORKSPACES/tmp_fr_ev/.venv/lib/python3.12/site-packages/paperbench/judge/simple.py", line 26, in <module>
    from nanoeval.solvers.computer_tasks.code_execution_interface import ComputerInterface
ModuleNotFoundError: No module named 'nanoeval.solvers'

After a short debugging session with Codex, the only viable solution we found involved using an admittedly inelegant stub that modifies the PATH. I've looked into several adjustments to pyproject.toml, but unfortunately, none resolved the underlying issue.

The proposed fix in this PR addresses the problem by explicitly expanding the modules exposed for these two packages. An alternative approach for alcatraz might involve adding __init__.py files; however, it appears this was intentionally avoided in your design. For nanoeval, although __init__.py files exist, the exportable scope remains limited, perhaps intentionally, though I'm unsure.

Please let me know if you have a preferred approach or any suggestions for implementing this differently.

Additionally, I've updated the CONTEXT_WINDOW_LENGTHS mapping to include various GPT-5 family models. I've set the context length to 272k instead of 400k, as the existing implementation seems primarily concerned with input length rather than total tokens. If this interpretation is incorrect, please advise, and I can update the values accordingly.

Thanks in advance for your feedback!

… dirs + added gpt-5 models to the context len mapping

thesofakillers

Thank you for flagging this.

You are right and we had not properly packaged some of the libraries, and didn't catch this cuz we work with them installed in editable mode typically.

I've left some comments adjusting your fixes. Thanks!

…ll packages in project/common

thesofakillers · 2025-10-28T09:38:57Z

This looks good to me, not sure what's going on with CI i think it's still the OPENAI_API_KEY. I'll just test it locally now and override it if they pass

stalkermustang · 2025-10-28T10:56:28Z

This looks good to me, not sure what's going on with CI i think it's still the OPENAI_API_KEY. I'll just test it locally now and override it if they pass

Thanks for merging.
I've checked the CI logs, it seems one needs to specify GRADER_OPENAI_API_KEY as well?

thesofakillers · 2025-10-31T09:21:07Z

Thanks for merging. I've checked the CI logs, it seems one needs to specify GRADER_OPENAI_API_KEY as well?

I don't think so, we only have OPENAI_API_KEY configured in the github settings here and that seems to be enough. Could just be weirdness with how CI interacts with forks 🤷🏻‍♂️

stalkermustang added 3 commits October 26, 2025 11:35

changes to alcatraz and nanoeval pyproject files to include necessary…

8e96a92

… dirs + added gpt-5 models to the context len mapping

updated pyproject for alcatraz

8bdbdfc

added utils to alcatraz exposed modules

7d1f515

stalkermustang mentioned this pull request Oct 26, 2025

PaperBench Env v1.0 PrimeIntellect-ai/community-environments#220

Open

14 tasks

thesofakillers suggested changes Oct 27, 2025

View reviewed changes

thesofakillers changed the title ~~Fixes to paperbench as an installable package for benchmarking~~ [PB] Fixes to various libraries to enable non-editable installing Oct 27, 2025

thesofakillers changed the title ~~[PB] Fixes to various libraries to enable non-editable installing~~ [PB] Packaging fixes to various lib to enable non-editable installing Oct 27, 2025

feedback fixes: gpt-5 context len -> 400k, pyproject.toml fixes for a…

f98c9a6

…ll packages in project/common

stalkermustang requested a review from thesofakillers October 27, 2025 14:53

thesofakillers approved these changes Oct 28, 2025

View reviewed changes

thesofakillers merged commit 7acfcc7 into openai:main Oct 28, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PB] Packaging fixes to various lib to enable non-editable installing#82

[PB] Packaging fixes to various lib to enable non-editable installing#82
thesofakillers merged 4 commits into
openai:mainfrom
stalkermustang:fix/fixes-to-paperbench

stalkermustang commented Oct 26, 2025 •

edited

Loading

Uh oh!

thesofakillers left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thesofakillers commented Oct 28, 2025

Uh oh!

Uh oh!

stalkermustang commented Oct 28, 2025

Uh oh!

thesofakillers commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stalkermustang commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thesofakillers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thesofakillers commented Oct 28, 2025

Uh oh!

Uh oh!

stalkermustang commented Oct 28, 2025

Uh oh!

thesofakillers commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stalkermustang commented Oct 26, 2025 •

edited

Loading