Codex searchshortqa deepspeed by daiweidaichenzi · Pull Request #791 · jingyaogong/minimind

daiweidaichenzi · 2026-06-02T04:13:10Z

No description provided.

- Add model_minimind_mla.py with Q/KV low-rank compression and decoupled RoPE - Add kv_norm on kv_latent, q_compress + q_up + q_rope_proj for Q path - Cache format: (kv_latent, k_rope) tuple, 176 floats/tok/layer (4.4x compression) - Add --use_mla flag to all training scripts and utils - Add benchmark_gqa_vs_mla.py script

…lation)

… 3090 compat)

daiweidaichenzi and others added 20 commits May 8, 2026 20:44

add Chinese annotations pretrain

9c23a92

增加mla注意力模型

49a05a1

merge local changes

de683ea

fix: add missing --use_mla to train_full_sft.py

8baa9c5

Merge branch 'master' of github.com:daiweidaichenzi/minimind

09e491f

[feat] add GQA vs MLA evaluation script (PPL + Q&A)

15645bb

fix: extract tools and tool_calls before apply_chat_template in PPL eval

579cc79

[feat] add DeepSpeed pretrain (ZeRO-2, BF16, auto LR, gradient accumu…

ed006a2

…lation)

fix: avoid double init_process_group in DeepSpeed pretrain

4f5cdfe

chore: add deepspeed suffix to save_weight, update default data_path

907f31f

chore: remove unused import

46dabcf

fix: switch DeepSpeed config from bf16 to fp16, ZeRO-2 to ZeRO-1 (RTX…

2201060

… 3090 compat)

feat: add SearchShortQA DeepSpeed pipeline

33233fe

fix: make DeepSpeed resume checkpoints robust

0425c4f

fix: keep only latest DeepSpeed checkpoint

e9c9395

fix: recover DeepSpeed resume step from checkpoint tag

4dcda45

feat: add pretrain evaluation scripts

eaf3a8c

fix: harden DeepSpeed SFT resume

5012b23

agentic

3233d98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex searchshortqa deepspeed#791

Codex searchshortqa deepspeed#791
daiweidaichenzi wants to merge 20 commits into
jingyaogong:masterfrom
daiweidaichenzi:codex-searchshortqa-deepspeed

daiweidaichenzi commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daiweidaichenzi commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants