Skip to content

Question about train dflash for a large target model #546

Description

@dongyibo

Hello, I want to train a Dflash draft model as a target model with a relatively large number of parameters. This target model, for example, DeepSeek v2, has 236B parameters and includes multiple dialogue formats such as think/no-think/function tool.

From experience, how much training data is appropriate for a model of this size?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions