Skip to content

Making nanotron working out of the box #399

Open
giux78 wants to merge 8 commits into
huggingface:mainfrom
mii-llm:main
Open

Making nanotron working out of the box #399
giux78 wants to merge 8 commits into
huggingface:mainfrom
mii-llm:main

Conversation

@giux78

@giux78 giux78 commented Mar 11, 2026

Copy link
Copy Markdown

We used nanotron for building a series of LLMs from scratch.

https://huggingface.co/collections/mii-llm/zagreus-04b

For making nanotron running and computing the loss, resuming from a checkpoint and for improving some edge cases we created a working fork. Now we would like to contribute with this PR that solve these issues:

  • When resuming from checkpoint for an edge case the learning rate reached zero and it was impossibile to resume
  • The parameters passing to the super class were old and wrong
  • Sometimes the index from bendable dataset loaded and metadata were disaligned so this is a better way to handle it
  • parametrizator_cls wants the config not config.model
  • There is an issue with DataFolder objects being used as keys in the metadata dicts
  • A problem with our configuration on LOCAL_RANK env variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant