Skip to content

Resolve tokenizer from local cache in offline mode#249

Open
sarathfrancis90 wants to merge 1 commit into
mistralai:mainfrom
sarathfrancis90:fix-offline-tokenizer-cache-resolution
Open

Resolve tokenizer from local cache in offline mode#249
sarathfrancis90 wants to merge 1 commit into
mistralai:mainfrom
sarathfrancis90:fix-offline-tokenizer-cache-resolution

Conversation

@sarathfrancis90

Copy link
Copy Markdown

Fixes #248

Bug

With HF_HUB_OFFLINE=1 and a model fully present in the local cache, download_tokenizer_from_hf_hub(repo_id, revision="main") crashes instead of resolving the tokenizer from the cache. This is hit in practice via vLLM / transformers v5, which route any model shipping tekken.json to MistralCommonBackend and pass revision="main".

Cause

Two issues combine in tokens/tokenizers/utils.py:

  1. In offline mode huggingface_hub raises OfflineModeIsEnabled, which is a builtin ConnectionError subclass, not a requests.* error. The except (requests.ConnectionError, requests.HTTPError, requests.Timeout) around list_repo_files therefore misses it and the existing local-files fallback is skipped.
  2. list_local_hf_repo_files only consulted refs/<DEFAULT_REVISION> when revision was None. An explicit branch/tag like "main" was looked up as a literal snapshots/main directory, which never exists (snapshots are keyed by commit hash), so it returned [].

Fix

  • Add huggingface_hub.errors.OfflineModeIsEnabled to the caught exceptions so offline mode reaches the same local-files fallback as a dropped connection (force_download still re-raises, unchanged).
  • Resolve any branch/tag revision to its commit hash via refs/<revision> before reading the snapshot directory.

Testing

  • Added a revision="main" case to test_list_local_hf_repo_files and a new test_download_tokenizer_from_hf_hub_offline_mode; both fail before the change and pass after.
  • pytest tests/test_utils.py (23 passing), ruff check, ruff format --check, and mypy src are all green.

download_tokenizer_from_hf_hub crashed offline (HF_HUB_OFFLINE=1) when a
branch/tag revision such as "main" was passed:

- huggingface_hub raises OfflineModeIsEnabled, a builtin ConnectionError
  subclass that is not a requests.* error, so the existing except tuple
  missed it and the local-files fallback was skipped.
- list_local_hf_repo_files only resolved refs/<DEFAULT_REVISION> when
  revision was None, so an explicit "main" was looked up as a literal
  snapshots/main directory (which never exists) and returned no files.

Catch OfflineModeIsEnabled alongside the requests errors, and resolve any
branch/tag revision to its commit hash via refs/<revision> before reading
the snapshot directory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant