Skip to content

Bug report in otb project #77

Description

@hkr04

In UnderThinkingBench, there exists some items without "source_dataset", but "source" instead with values "aime" and "hmmt", which seems to be in the UnderThinkingBench-Math subset.

UnderThinkingBench entry:

RAM/projects/otb/eval.py

Lines 39 to 40 in 264c047

elif subset == "underthinking-bench" or "underthinking" in subset:
acc = eval_underthink(row)

Error encountered here:

def eval_underthink(row, find_last_box: bool = False) -> float:
puzzle = json.loads(row["metadata"])["source_dataset"]

Possible solution in eval.py:

import json

# ...

elif subset == "underthinking-bench" or "underthinking" in subset:
    metadata = json.loads(row["metadata"])
    if "source" in metadata and metadata["source"] in ["aime", "hmmt"]:
        acc = eval_math(row, tokenizer, model_name)
    else:
        acc = eval_underthink(row)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions