Hi there,
I just tried running the LV-Eval -128k subset on the Qwen2.5-3B-Instruct model(no AHN enabled). The performance is terribly low and can't match the number reported in paper.
So here is the screenshot. I ran the evaluation on a single A6000Ada 48G VRAM with single process mode. Could you please share the model prediction (the json files) so I can verify what goes wrong.

Hi there,
I just tried running the LV-Eval -128k subset on the Qwen2.5-3B-Instruct model(no AHN enabled). The performance is terribly low and can't match the number reported in paper.
So here is the screenshot. I ran the evaluation on a single A6000Ada 48G VRAM with single process mode. Could you please share the model prediction (the json files) so I can verify what goes wrong.