Program-of-Thought LLM agent for conversational numerical reasoning over financial documents (ConvFinQA). 82% exec-acc with the cheapest model; reproducible eval, executor-checked gold, documented negative-result analysis.
python evaluation financial-nlp llm numerical-reasoning llm-agent deepseek program-of-thought convfinqa
-
Updated
Jun 25, 2026 - Python