Fixed prerelease partner loading issue#1277
Conversation
PR Review: Fixed prerelease partner loading issueThe fix is correct and well-structured. The core approach — separating the public Important
Suggestions
Otherwise looks good
|
There was a problem hiding this comment.
Pull request overview
This PR addresses schema loading issues around prerelease/library schemas by adjusting cache lookup defaults, improving error reporting, and adding auto-refresh behavior when a requested schema version is missing from the default cache.
Changes:
- Change
hed_cache.get_hed_versions()default to exclude prereleases and require explicitcheck_prerelease=Truewhen needed. - Update schema loading/error paths to include prerelease versions in “known versions” reporting and adjust the error code/message when no default version can be resolved from cache.
- Add
get_hed_version_path()auto-refresh behavior (refresh cache from GitHub on default-cache misses) and extend test coverage accordingly.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/schema/test_hed_schema_io.py |
Updates assertions for the new exception code/message when resolving default schema version fails. |
spec_tests/test_hed_cache.py |
Adjusts prerelease version tests for the new default and adds tests for get_hed_version_path() auto-refresh behavior. |
hed/schema/hed_schema_io.py |
Changes default-version resolution error code/message and includes prereleases in “known versions” lists on failure. |
hed/schema/hed_cache.py |
Changes get_hed_versions() default prerelease behavior and adds auto-refresh + helper lookup for get_hed_version_path(). |
CHANGELOG.md |
Updates narrative around prerelease/version lookup behavior and API defaults. |
| # Version not found locally — try refreshing cache from GitHub (default cache only) | ||
| if not xml_version or local_hed_directory != HED_CACHE_DIRECTORY: | ||
| return None | ||
|
|
||
| cache_xml_versions() | ||
| return _find_hed_version_path(xml_version, library_name, local_hed_directory) |
There was a problem hiding this comment.
get_hed_version_path now calls cache_xml_versions() on a cache miss in the default cache. cache_xml_versions downloads and caches all standard + library schemas, which can be very expensive and surprising for a simple "path lookup" API (latency, bandwidth, GitHub rate limits). Consider a lighter-weight refresh (e.g., only fetch the requested version, or only refresh an index/manifest) and/or an explicit opt-in flag so callers can control network side effects.
|
PR Review placeholder - testing auth |
|
Part 1 of review (see next comment for Part 2): The fix is sound and well-targeted. Two issues worth addressing before merge. --- ISSUE 1: get_hed_version_path() docstring is outdated. The docstring still says 'Only returns filenames that exist', which was accurate before but is now misleading. After this PR the function can trigger a network download via cache_xml_versions() when the version is not found locally. The local_hed_directory parameter description also does not document the auto-download asymmetry between the default cache and a custom directory. Suggested replacement: 'Get the HED XML file path for a given version. Searches the local cache first. If the version is not found and local_hed_directory is the default HED cache, the cache is refreshed from GitHub before a second lookup. No network call is made for custom directories. Parameters: xml_version (str): The version string to look up. library_name (str or None): Optional schema library name. local_hed_directory (str or None): Path to local HED directory. Defaults to HED_CACHE_DIRECTORY. Passing a custom path disables the automatic GitHub refresh. Returns: Union[str, None]: The path to the requested HED XML file, or None.' |
|
Part 2 of review: ISSUE 2: silent cache_xml_versions() failure is unobservable. In the new get_hed_version_path() the return value of cache_xml_versions() is silently discarded. It returns -1 for any failure (network error, lock contention, or 'cached too recently'). When it returns -1 the second _find_hed_version_path() call scans an unchanged cache and returns None, but the caller cannot tell whether the version genuinely does not exist or the download simply failed. A brief comment, or a logger.debug on the -1 case, would make the intent clear to future maintainers. --- Everything else looks good: the check_prerelease default revert is correctly documented, the error-code change from FILE_NOT_FOUND to BAD_PARAMETERS for the empty-cache-no-version path is semantically correct, and the new tests cover both the custom-directory no-refresh path and the default-cache auto-refresh path. |
|
PR Review: Fixed prerelease partner loading issue The fix is correct and well-structured. Separating the get_hed_versions() default (released-only) from the internal path-lookup helper (always includes prerelease) cleanly solves the prerelease partner loading problem without a silent API break. Important: cache_xml_versions() failure is unobservable (hed/schema/hed_cache.py): The return value of cache_xml_versions() is silently discarded. It returns -1 for network errors, lock contention, and rate-limit cases. In all these cases the caller receives None and cannot distinguish 'version does not exist' from 'download failed'. The comment in the code documents the intent, but a logger.warning on the -1 return would make failures observable without changing the control flow. Inline comment added. Suggestion: get_hed_versions() docstring (hed/schema/hed_cache.py line ~91): The check_prerelease description still reads 'Pass False to get only released versions (used by compliance checks)', framing False as a special-case override when it is now the default. Suggest: 'Default is False, returning only released versions. Pass True to include prerelease schemas.' Otherwise looks good:
|
| if not xml_version or local_hed_directory != HED_CACHE_DIRECTORY: | ||
| return None | ||
|
|
||
| cache_xml_versions() |
There was a problem hiding this comment.
The return value of cache_xml_versions() is silently discarded. It returns -1 on network errors, lock contention, and rate-limit/too-recent cases. The comment above documents this correctly, but a log line on failure would make the silent skip observable:
| cache_xml_versions() | |
| result = cache_xml_versions() | |
| if result == -1: | |
| import logging | |
| logging.getLogger(__name__).debug( | |
| "cache_xml_versions() could not refresh (network error, lock contention, or cached recently); " | |
| "version %r may not be found.", xml_version | |
| ) |
Without logging, a transient network failure is completely invisible — callers just get None with no way to distinguish "version does not exist" from "download silently failed".
No description provided.