Skip to content

Fixing block size for Mistral-7B.#141

Open
Artyom17 wants to merge 3 commits into
meta-pytorch:mainfrom
SesameAILabs:art/fix-mistral
Open

Fixing block size for Mistral-7B.#141
Artyom17 wants to merge 3 commits into
meta-pytorch:mainfrom
SesameAILabs:art/fix-mistral

Conversation

@Artyom17

Copy link
Copy Markdown
Contributor

According to Mistral's paper the block size for Mistral-7B should be 8192 (ref: https://arxiv.org/pdf/2310.06825.pdf, https://huggingface.co/docs/transformers/en/model_doc/mistral). But currently it is set to the default value (2048).

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2024
@Artyom17

Copy link
Copy Markdown
Contributor Author

It also saves some memory on 'freq_cis' tensor when the large block_size is used with relatively small max_seq_length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants