All API providers return actual token counts in their response metadata. Capture prompt_tokens and completion_tokens per call and store in each dictResult alongside predicted_score.\n\nAt run completion, write {results_dir}/cost_summary.json:\n\njson\n{\n "model": "gpt-4o",\n "variant": "female",\n "total_prompt_tokens": 42000,\n "total_completion_tokens": 1400,\n "actual_cost_usd": 0.44\n}\n\n\nActual cost = ground truth for future cost estimation calibration.
All API providers return actual token counts in their response metadata. Capture
prompt_tokensandcompletion_tokensper call and store in eachdictResultalongsidepredicted_score.\n\nAt run completion, write{results_dir}/cost_summary.json:\n\njson\n{\n "model": "gpt-4o",\n "variant": "female",\n "total_prompt_tokens": 42000,\n "total_completion_tokens": 1400,\n "actual_cost_usd": 0.44\n}\n\n\nActual cost = ground truth for future cost estimation calibration.