Fixing block size for Mistral-7B. by Artyom17 · Pull Request #141 · meta-pytorch/gpt-fast

Artyom17 · 2024-03-19T02:10:59Z

According to Mistral's paper the block size for Mistral-7B should be 8192 (ref: https://arxiv.org/pdf/2310.06825.pdf, https://huggingface.co/docs/transformers/en/model_doc/mistral). But currently it is set to the default value (2048).

…ax_seq_length

Artyom17 · 2024-03-19T23:42:12Z

It also saves some memory on 'freq_cis' tensor when the large block_size is used with relatively small max_seq_length.

Fixing block size for Mistral-7B.

cc4cbec

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2024

Saving some memory on freq_cis tensor with big block_size but small m…

8ca548c

…ax_seq_length

Merge branch 'main' into art/fix-mistral

f21da73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing block size for Mistral-7B.#141

Fixing block size for Mistral-7B.#141
Artyom17 wants to merge 3 commits into
meta-pytorch:mainfrom
SesameAILabs:art/fix-mistral

Artyom17 commented Mar 19, 2024

Uh oh!

Artyom17 commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Artyom17 commented Mar 19, 2024

Uh oh!

Artyom17 commented Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants