Skip to content

Fix file cache to support offline usage of previously downloaded files#586

Open
Chessing234 wants to merge 1 commit into
allenai:mainfrom
Chessing234:fix/file-cache-offline
Open

Fix file cache to support offline usage of previously downloaded files#586
Chessing234 wants to merge 1 commit into
allenai:mainfrom
Chessing234:fix/file-cache-offline

Conversation

@Chessing234

Copy link
Copy Markdown

Fixes #535

Bug

get_from_cache() always sends an HTTP HEAD request before checking the local cache. After a successful download, loading the linker again fails offline because the HEAD request errors before the cached file is returned.

Root cause

get_from_cache() calls requests.head(url) unconditionally to obtain an ETag and derive the cache filename, even when a matching cached file already exists under ~/.scispacy/datasets/.

Why this fix is correct

Cached filenames always start with the SHA-256 hash of the URL. Scan the cache directory for an existing file with that prefix and return it before any network call. Fresh downloads still use HEAD to resolve the ETag when no cached file is present.

Made with Cursor

Check the local cache before calling requests.head so previously
downloaded linker data works offline.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SciSpacy UmlsLinkerPaths Linker does not cache!

1 participant