Hi, thanks for sharing codes.
I think the deepmind version code def speculative_sampling_v2 should be fixed.
because the way code implmented the part of comparing 's probability distribution and q's probability distribution is different to algorithm in the Deepmind paper.
[current code]
if r < torch.min(torch.tensor([1], device=q.device), p[:, prefix_len + i - 1, draft_token_idx] / q[:, prefix_len + i - 1, draft_token_idx]):
[correct code]
if r < torch.min(torch.tensor([1], device=q.device), q[:, prefix_len + i - 1, draft_token_idx] / p[:, prefix_len + i - 1, draft_token_idx]):
Hi, thanks for sharing codes.
I think the deepmind version code
def speculative_sampling_v2should be fixed.because the way code implmented the part of comparing 's probability distribution and q's probability distribution is different to algorithm in the Deepmind paper.
[current code]
if r < torch.min(torch.tensor([1], device=q.device), p[:, prefix_len + i - 1, draft_token_idx] / q[:, prefix_len + i - 1, draft_token_idx]):[correct code]
if r < torch.min(torch.tensor([1], device=q.device), q[:, prefix_len + i - 1, draft_token_idx] / p[:, prefix_len + i - 1, draft_token_idx]):