Skip to content
This repository was archived by the owner on May 16, 2025. It is now read-only.
This repository was archived by the owner on May 16, 2025. It is now read-only.

GPTRewardModel class #8

@israelpf

Description

@israelpf

Why are the rewards truncated in the "GPTRewardModel" class? What is the reason and where can I find more information about it?

        # Retrieve first index where trajectories diverge
        divergence_ind = (chosen[i] != rejected[i]).nonzero()[0]
        assert divergence_ind > 0

        # Index into the correct rewards
        c_truncated_reward = chosen_rewards[i][divergence_ind:end_ind]
        r_truncated_reward = rejected_rewards[i][divergence_ind:end_ind]

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions