Skip to content

bug: different behaviors with zero-2/3 on single GPU #69

@lkevinzc

Description

@lkevinzc

Hi @cameron-chen and @lkevinzc,

The figure below shows the results I got from experiments with zero-2 and zero-3 in the single GPU setting. From the args file, you guys asserted that when using zero-3, fuse_lm_head cannot be enabled, so I didn’t run experiments with it.

The results clearly show that zero-3 is able to learn, while zero-2 (with or without fuse_lm_head) is not able to learn.

image

Originally posted by @hmhuy0 in #68 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions