You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
moe-engine is a research-grade infrastructure layer for training large Mixture-of-Experts language models at hyperscale. It is designed around one core constraint: at 10K+ GPUs, nodes die continuously. The system must keep training alive end-to-end — routing correctly, checkpointing durably, and resuming without operator intervention.