Skip to content

[bug] [draft] continuous action logprob precision mismatch#584

Draft
yacineMTB wants to merge 2 commits into
PufferAI:4.0from
yacineMTB:fix-continuous-action-logprob-4.0
Draft

[bug] [draft] continuous action logprob precision mismatch#584
yacineMTB wants to merge 2 commits into
PufferAI:4.0from
yacineMTB:fix-continuous-action-logprob-4.0

Conversation

@yacineMTB

Copy link
Copy Markdown

Still working on reproductions and baseline sweeps.

image

better one is post fix

Basically; if precision_t was bf16/fp16, the sampled fp32 action gets rounded, and that messes with things downstream in PPO.

discrete doesn't have this problem because it uses integers
This diff does improve things, tested in a sweep, but this code smells bad so I need to read it tomorrow morning. Draft

Comment thread src/pufferlib.cu Outdated
float mean = to_float(a.logits[logits_base + h * a.logits_stride_a]);
float log_std = to_float(a.logstd[h]);
float action = float(g.actions[nt * a.num_atns + h]);
float action = to_float(g.actions[nt * a.num_atns + h]);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary..

@yacineMTB yacineMTB force-pushed the fix-continuous-action-logprob-4.0 branch from 4dd4990 to af45050 Compare June 13, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant