Skip to content

[Feature] Adaptation of Jet-Nemotron-2B New Hybrid Architecture Language Model #531

@loading66

Description

@loading66

Checklist

Motivation

Adapt the Jet-Nemotron-2B model. It adopts Post Neural Architecture Search, an efficient post-training architecture exploration and adaptation pipeline applicable to any pre-trained Transformer model. Its linear module JetBlock is a novel linear attention module, whose performance significantly outperforms previous designs such as Mamba2.

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions