In UnderThinkingBench, there exists some items without "source_dataset", but "source" instead with values "aime" and "hmmt", which seems to be in the UnderThinkingBench-Math subset.
UnderThinkingBench entry:
|
elif subset == "underthinking-bench" or "underthinking" in subset: |
|
acc = eval_underthink(row) |
Error encountered here:
|
def eval_underthink(row, find_last_box: bool = False) -> float: |
|
puzzle = json.loads(row["metadata"])["source_dataset"] |
Possible solution in eval.py:
import json
# ...
elif subset == "underthinking-bench" or "underthinking" in subset:
metadata = json.loads(row["metadata"])
if "source" in metadata and metadata["source"] in ["aime", "hmmt"]:
acc = eval_math(row, tokenizer, model_name)
else:
acc = eval_underthink(row)
In UnderThinkingBench, there exists some items without "source_dataset", but "source" instead with values "aime" and "hmmt", which seems to be in the UnderThinkingBench-Math subset.
UnderThinkingBench entry:
RAM/projects/otb/eval.py
Lines 39 to 40 in 264c047
Error encountered here:
RAM/projects/otb/evals/underthink_eval.py
Lines 33 to 34 in 264c047
Possible solution in
eval.py: