Fix container pip install for Singularity (follow-up to #4380) by esutlie · Pull Request #4438 · SpikeInterface/spikeinterface

esutlie · 2026-03-12T19:08:31Z

Summary

Pass the pip install command as a list instead of a string in install_package_in_container() github mode, so it works on both Docker and Singularity.

Context

PR #4380 fixed the #egg= deprecation by switching to Direct URL syntax, but the quoted string format only works on Docker. The two container backends split command strings differently:

Docker (exec_run) uses shlex.split(), which is shell-aware and strips quotes
Singularity (spython) uses str.split(" "), which splits naively and leaves quotes attached

So a command like pip install "pkg @ url" gets parsed correctly by Docker but breaks on Singularity. Passing the command as a list bypasses both splitting mechanisms, keeping the pkg @ url requirement as a single argument on both backends. This is already the pattern used in runsorter.py.

Tested against real Docker containers with pip 26.0+ simulating both code paths, and confirmed working in a real Singularity workflow.

Fixes #4368

Pass the pip command as a list instead of a string in install_package_in_container() for github installation mode. The previous fix (PR SpikeInterface#4380) used a quoted string format which works on Docker (shlex.split strips quotes) but fails on Singularity (str.split keeps quotes attached). Passing a list bypasses both splitting mechanisms, keeping the PEP 508 Direct URL requirement as a single argument on both backends. Fixes SpikeInterface#4368

for more information, see https://pre-commit.ci

alejoe91 · 2026-03-12T20:11:28Z

Thank you Elissa! LGTM

The Singularity container pip install fix (esutlie/spikeinterface fix/pip-direct-url-quotes) has been merged upstream as SpikeInterface/spikeinterface#4438. Switch from fork pin to upstream main at the merge commit. Can move to a release pin (0.104.0) when it's available.

…nterface-pin Merging this myself since it's a dependency-only change. The fork fix (Singularity container pip install) was merged upstream today as SpikeInterface/spikeinterface#4438, so this just switches the pin from my fork to the upstream merge commit. No code changes. Moving the v0.2.0 tag forward to include this.

* fix: handle stale module state and json metadata in integration tests The full_pipeline fixture now patches db_prefix and streams_maker.schema_name to match the golden test config, clears cached streams module, and re-activates pipeline schemas when other integration tests have set a different prefix. Also handle MariaDB json-as-longtext for metadata assertions in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean streams.py to catalog-only baseline for stable test imports streams.py previously contained experiment-specific dynamic tables that caused test failures due to stale schema references on import. Now the committed file has only catalog tables (StreamType, DeviceType, DeviceName, Device) while streams_maker.main() continues to append dynamic tables at runtime. Test fixtures use schema.activate() with database reset to rebind schemas across different test prefixes without needing file deletion. Also fixes ruff lint/format violations, adds save/restore of db_prefix in fixture teardown to prevent state leakage, and adds logging to teardown exception handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address PR #524 review feedback for load_metadata.py - Move `import typing` and `import re` to module-level imports instead of importing inside individual functions - Remove duplicate `import json` inside get_stream_reader_for_epoch (already imported at module level) - Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of `rig.model_fields` (instance-level access will break in Pydantic V3) - Fix ruff B007: rename unused loop variables to underscore-prefixed - Apply ruff format to normalize quote style Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address remaining PR #524 review feedback - Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__ for backward compatibility) - Fix device_sn type hint: dict[str, str | None] (values can be None) - Use setdefault for device_info dict initialization - Use list comprehension for table_attribute_entry construction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: use cls.__name__ for device type and @data_reader for device detection Replace device_type field (which gives inherited parent class names like "HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog entries. Detect devices by presence of @data_reader methods instead of device_type field, decoupling from the deprecated device_type property. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add unit test workflow for DJ pipeline PRs Runs pytest -m unit on PRs to datajoint_pipeline branch using uv. No database or golden datasets required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add integration tests to DJ pipeline CI workflow Run unit and integration tests as parallel jobs. Integration tests use testcontainers MySQL (no golden datasets). Golden-dataset tests skip automatically in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update SPEC files to match current implementation Sync both spec files with actual code after cls.__name__, @data_reader detection, and CI workflow changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(ephys): revamp primary keys to subject-centric design Add ProbeInsertion, TargetArea, InsertionTargetArea tables. Reparent EphysChunk and EphysBlock under ProbeInsertion instead of Experiment+Probe. Update ingest_chunks() signature to accept subject and insertion_number. Update all scripts (launch_si_gui, save_curation, run_aeon_spike_sorting) to use new PK fields. Pin datajoint==0.14.6. * feat(ephys): add unit matching and chunked spike output tables Add UnitMatchingMethod, UnitMatching, UniversalUnit (+Matched part), and ChunkedSpikeTimes tables to spike_sorting.py. UnitMatching.make() uses SpikeInterface's compare_two_sorters() to match units across temporally overlapping sessions via spike time coincidence. Update restore_raw_sorting() in spike_sorting_curation.py with 3-step unit matching cleanup (delete UnitMatching, delete Matched rows with force=True to bypass Part-table protection, delete orphaned UniversalUnit rows) before deleting SortedSpikes. * refactor(ephys): v2 PK revamp — remove subject from keys, add EphysEpoch/EphysSubject Primary key changed from (experiment_name, subject, insertion_number) to (experiment_name, insertion_number) throughout the ephys pipeline. Key changes: - ProbeInsertion: references Experiment (not Experiment.Subject), adds probe_label - EphysEpoch (Imported): auto-discovers probes from epoch files, creates ProbeInsertion (Probe must pre-exist — probe_type cannot be inferred from metadata) - EphysSubject (Manual): FK to Experiment.Subject for subject association - ingest_chunks: single-arg (experiment_name), loops ProbeInsertions, dynamic sync reader - NeuropixelsV2 StreamGroup added for V2 hardware sync file support - Ceph overwrite protection: FileExistsError checks in PreProcessing/SpikeSorting/PostProcessing - All spike times as datetime64[ns] (absolute) - Subject removed from all scripts (launch_si_gui, save_curation, run_aeon_spike_sorting) * test(ephys): add v2 setup and validation scripts for HPC testing ephys_v2_setup.py: 20-step test setup script (experiment creation through spike sorting) for validating the v2 pipeline against AEONX1/social-ephys0.1. ephys_v2_validate.py: comprehensive validation checks for PK structure, table contents, and data integrity after running the setup script. * docs: add unit matching design spec Covers schema design, matching algorithm, ownership convention for overlapping chunks, and re-processing workflows. Key design decisions: UnitMatchingMethod as non-PK FK, Matched as Part of UnitMatching, ChunkedSpikeTimes with unique index enforcement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: enforce temporal ordering via key_source + make() guard Replace soft "process in order" convention with two-level enforcement: key_source only yields earliest unprocessed block per insertion, make() guard rejects out-of-order direct calls. Safe under parallel populate(reserve_jobs=True). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spec): scope unique index per-insertion for multi-probe subjects The unique index on ChunkedSpikeTimes was (universal_unit, chunk_start), which would reject valid rows from different insertions that share the same universal_unit ID. Fixed to include the full insertion scope: (experiment_name, subject, insertion_number, universal_unit, chunk_start). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spec): use datetime64[ns] for ChunkedSpikeTimes spike_times Match the format used by SyncedSpikes.Unit rather than converting to float64 epoch seconds. Conversion to epoch seconds only happens internally during the matching algorithm for SpikeInterface compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spec): rename Part tables and add electrode to Unit Rename Part tables for consistency with existing pipeline pattern: - UnitMatching.Matched → UnitMatching.Unit - UnitMatching.ChunkedSpikeTimes → UnitMatching.Spikes Add denormalized peak electrode FK (-> ephys.ElectrodeConfig.Electrode) to UnitMatching.Unit for query convenience. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spec): rename UniversalUnit to GlobalUnit, move electrode to unit roster - Rename UniversalUnit → GlobalUnit, universal_unit → global_unit throughout - Move electrode FK from UnitMatching.Unit to GlobalUnit (ProbeType.Electrode) - Rewrite query patterns section around UX workflow: insertion → units → spikes - Add algorithm step for updating electrode on matched units Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: implement unit matching system per spec - Replace UniversalUnit with GlobalUnit (Manual, electrode on ProbeType.Electrode) - Restructure UnitMatching: UnitMatchingMethod as non-PK FK - Add UnitMatching.Unit Part (replaces UniversalUnit.Matched) - Add UnitMatching.Spikes Part with ownership convention + unique index (replaces standalone ChunkedSpikeTimes Computed) - Enforce temporal ordering via key_source MIN(block_start) gate + make() guard - Simplify restore_raw_sorting(): no more force=True, clean cascade - Fix spec: master insert before parts (DataJoint requirement) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: replace 'session' terminology with 'EphysBlock'/'block' Use the pipeline's formal 'EphysBlock' terminology consistently across code comments, log messages, docstrings, and spec to avoid confusion with the ambiguous 'session' term. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spec): remove subject from PK per upstream v2 revamp Update spec to reflect upstream ProbeInsertion PK change: (experiment_name, subject, insertion_number) → (experiment_name, insertion_number) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spec): restore subject to PK, redesign EphysEpoch and EphysChunk - Restore subject to ProbeInsertion PK (surgical event model, probe-swap correctness) - Eliminate EphysSubject table (subject now structurally required in PK) - Redesign EphysEpoch as per-epoch probe registry with .Insertion Part table - Move probe_label from ProbeInsertion to EphysEpoch.Insertion (epoch-level detail) - Add EphysEpoch FK to EphysChunk for epoch traceability - Add subject guard to ingest_chunks (skip files without subject-probe mapping) - Update all PKs, unique indexes, query patterns, and ERD throughout spec Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(ephys): restore subject to PK, redesign EphysEpoch and ingest_chunks Implement upstream schema changes per SPEC_unit_matching.md Section 2: - ProbeInsertion: restore subject to PK, remove probe_label - EphysSubject: eliminate (subject now structurally in PK) - EphysEpoch: redesign as per-epoch probe registry with .Insertion Part table, carry-forward semantics, auto-Probe creation - EphysChunk: add FK to EphysEpoch for epoch traceability - ingest_chunks(): epoch-aware file routing with subject guard - All downstream insertion_key, key_source, unique index include subject - PreProcessing.infer_output_dir() includes subject in path - Update support scripts for new schema structure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * stub _read_probe_assignments as NotImplementedError placeholder The exact file format and carry-forward logic will be determined once experimental data conventions are finalized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(ephys): extract helpers into utils/ephys_utils.py Move probe discovery, metadata parsing, sync model processing, and probe type creation out of ephys.py into a dedicated utils module. ephys.py now contains only table definitions and orchestration logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(spike_sorting): add UnitMatchingParamSet for parameterizable matching - New UnitMatchingParamSet Lookup (method + params longblob) - UnitMatching: paramset in PK, enabling parallel matching runs - GlobalUnit: paramset as non-PK FK (provenance, not identity) - key_source serialized per (insertion, paramset) - Spec updated with new table, ERD, and design rationale Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(spike_sorting): add pair-based unit matching alternative (UnitMatchingPair + PairMatching) Implements an alternative to the sequential UnitMatching design where users explicitly specify which two blocks to compare. Key design choices: - UnitMatchingPair (Manual): user picks two SyncedSpikes + paramset - PairMatching (Computed): performs matching for the specified pair - PairMatching.Unit: keyed by GlobalUnit, records both earlier_unit and latter_unit via nullable projected FKs to SortedSpikes.Unit - Canonical ordering (earlier < latter) prevents duplicate pairs - Anchor/seed logic determines which block has existing GlobalUnit assignments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spike_sorting): replace PairMatching with seed-based bidirectional UnitMatching Add seed_block_start to UnitMatchingParamSet so scientists can control where matching starts (best signal quality block). Rewrite key_source as pure SQL with seed-aware frontier logic that propagates outward in both directions. Replace temporal ordering guard with seed-first + overlap check. Remove PairMatching/UnitMatchingPair (rejected alternative). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: upgrade core pipeline to datajoint 2.x - Update DJ config from dict-based to Pydantic attribute-based API - Convert fetch patterns: fetch(as_dict=True) -> to_dicts(), fetch("col") -> to_arrays(), fetch(format="frame") -> to_pandas(), fetch("KEY") -> keys() - Fix blob detection: type == "longblob" -> is_blob (handles DJ 2.x <blob> codec) - Convert bare int/float types to int32/float32 in table definitions - Replace dj.schema() with dj.Schema(), populate(limit=) with populate(max_calls=) - Fix DJ 2.x lineage check in load_metadata aggr join (semantic_check=False) - Use dynamic module reference in paths.py for repository_config patching - Simplify test fixtures: set database_prefix before pipeline import so schemas activate naturally with test prefix, eliminating manual re-decoration - Remove populate/worker.py (deferred to future work) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version to 0.4.0 for datajoint 2.x upgrade Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update uv.lock for v0.4.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update streams.py longblob -> <blob> for DJ 2.x compatibility The auto-generated streams.py had legacy longblob type which DJ 2.x treats as raw bytes (no serialization). Replace with <blob> to match the streams_maker.py template. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update aeon/dj_pipeline/docs/specs/SPEC_unit_matching.md Co-authored-by: Elissa Sutlief <elissasutlief@gmail.com> * fix: strengthen stream data tests to assert sample_count > 0 Tests previously only checked that stream entries existed, not that they contained actual data. All streams had sample_count=0 due to empty reader pattern prefixes (see #536). Now tests assert sample_count > 0. Also adds fetch_stream integration tests and updates uv.lock for aeon_exp_foragingABC Rig fix (DataSchema + BaseSchema inheritance). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen stream tests and add fetch_stream integration tests - Assert sample_count > 0 in stream data tests (not just entry existence) - Add TestFetchStream class with 5 tests: DataFrame return, video columns, harp columns, drop_pk behavior, and timestamp rounding - Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Adapt test scripts for tn/ephys_revamp_v2 and add auto-approval curation path Setup/validate scripts updated for v2 PK structure (subject in PK), new table names (GlobalUnit, UnitMatching.Spikes), manual ProbeInsertion setup, single electrode group, and consolidated Phase 3 steps. ApplyOfficialCuration.make() now handles auto-approved curations natively (no file + parent_curation_id=-1) so the setup script uses .populate() instead of allow_direct_insert=True. * Update DB prefix to u_elissas_aeon_ephys_v2_test_ to match existing grant * Insert Subject before Experiment.Subject (FK dependency) * Fix SQL syntax error: remove quotes from TargetArea comment MariaDB chokes on double quotes inside DJ column comments. * Insert ProbeType manually (HPC has no internet for probeinterface) * Skip epoch ingestion, insert Epoch manually in step 5 Ephys-only data (social-ephys0.1) has no CameraTop/FrameTop chunk files for Epoch.ingest_epochs() to discover. Insert the Epoch entry directly from the known epoch directory name in step 5 alongside ProbeInsertion and EphysEpoch. * Update SLURM scripts for v2 test: add subject to key, use uv instead of conda * Pin spikeinterface to esutlie fork to fix Singularity pip install quoting bug * refactor: adopt aeon_exp package path convention Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp to match the convention established in the production linear-drive branch of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: adopt aeon_exp package path convention Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp to match the convention established in the production linear-drive branch of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Run spike sorting for remaining 3 blocks in a single SLURM job * Fix CurationMethod contents: each entry must be a separate tuple * Fix UnitMatching.key_source: project to PK before union to avoid secondary attribute conflict * Add unit matching QC plots: gantt chart, heatmap, yield, longevity, spike consistency * Add diagnostic script to debug unit matching failures * Add repair script to re-ingest chunks with missing SyncModel entries * Fix repair script: delete EphysBlockInfo before EphysChunk (part table constraint) * Fix timezone-aware datetime in SyncModel insert (MariaDB rejects +00:00 suffix) * Fix get_matching() API: returns tuple of two Series, not a dict * QC gantt chart: solid lines spanning blocks, thinner and more compact * Gantt chart: give single-block units a minimum bar width * Gantt chart: bars span full block width edge-to-edge * Add targeted diagnostic for block 0↔1 matching failure Replicates UnitMatching.make() logic step-by-step, printing intermediate results. Also compares at various delta_time thresholds to distinguish code bug from genuine lack of matching units. * Fix: get_matching() returns Series, not dict — iterate values directly * Add temporal offset measurement to block matching diagnostic * Add chunk-level spike time comparison to distinguish sync vs sorting issue * Add diagnostic script to measure blob sizes in ephys tables Queries actual LENGTH() of longblob columns to identify candidates for externalization to blob@dj_store. * Add fallback to old test schema for blob size measurement Checks elissas_aeon_ephys_test_ tables when v2 test tables don't exist, so we can measure SortedSpikes and SyncedSpikes. * Move large spike blobs to external storage (blob@dj_store) Switch longblob → blob@dj_store for columns averaging >100 KB: - SortedSpikes.Unit: spike_indices, spike_sites, spike_depths - SyncedSpikes.Unit: spike_times - UnitMatching.Spikes: spike_times Data stored as files on ceph (/ceph/aeon/aeon/dj_store), DB keeps only a 16-byte UUID reference. Reduces DB size significantly for units with ~200K+ spikes per entry. * Clean up scripts for PR - Remove one-off debug/diagnostic scripts: debug_block01_matching.py, debug_unit_matching.py, repair_sync_models.py, measure_blob_sizes.py - Replace hardcoded DB prefix with config-based safety check that blocks production prefix and host - Add single/multi block modes to run_aeon_spike_sorting.py * Pin spikeinterface to upstream main (fix merged as PR #4438) The Singularity container pip install fix (esutlie/spikeinterface fix/pip-direct-url-quotes) has been merged upstream as SpikeInterface/spikeinterface#4438. Switch from fork pin to upstream main at the merge commit. Can move to a release pin (0.104.0) when it's available. * feat: implement AeonStreamCodec for lazy-loading stream data Replace blob-based storage in auto-generated stream tables with a codec-based approach: - Individual data columns store JSON summary stats (min, max, mean, dtype) - timestamps column stores JSON with time range and sampling rate - stream_df column uses <aeon_stream> codec: stores JSON reference, returns full DataFrame from raw files on fetch - ~3700x storage reduction in MySQL (~1.6MB → ~0.4KB per row) Changes: - New: aeon/dj_pipeline/utils/codec.py (AeonStreamCodec, column_stats, timestamp_stats) - Updated: streams_maker.py table definition and make() method - Updated: fetch_stream() to support both codec and legacy blob tables - Updated: test fixtures to handle streams.py regeneration - New: TestCodecStreamData regression tests - New: test_codec.py unit tests for helper functions - Added: scripts/test_load_stream.py prototype/validation script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: remove legacy blob path from fetch_stream Full switch to codec-based stream tables — no backward compatibility needed for blob-based tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * cleanup: remove median_dt_ms from timestamp stats sampling_rate_hz is sufficient; median_dt_ms is redundant (inverse). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0 aeon_exp_foragingABC main is now the production branch (tn/data-api-updated merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): split test extras to avoid private repo dependency in CI Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra. CI uses --extra test (no private repo access needed), local dev uses --extra test-golden for golden dataset tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): skip foragingABC-dependent tests when package not installed TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via get_experiment_pydantic(). Skip gracefully in CI where the private swc-aeon-rigs-foragingabc package is not installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pin requires-python to <3.13 for numpy compatibility numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Temporarily allow PRs to all branches to trigger test_dj_pipeline workflow * Migrate pytest.ini to pyproject.toml and ignore specific warnings * Resolve deprecated positional inserts * Skip full ingestion tests if aeon_exp package is unavailable * Pin spikeinterface>= .104.0 * Collect all docs in root level docs dir * Move and rename spike sorting curation specifications * Fix typo in aeon/dj_pipeline/ephys.py Co-authored-by: Elissa Sutlief <elissasutlief@gmail.com> * feat: rework `streams_maker` utils to be compatible with pydantic-based Experiment/Rig definition * docs: clean up STREAMS_MAKER.md * chore: update swc-aeon dependency to version 0.1.0 and add source configuration for uv * refactor: streamline metadata ingestion in EpochConfig by consolidating imports and updating metadata file handling * chore: update version to 0.2.2 and remove unused schema definitions in ingestion_schemas.py * test: add integration tests for load_new_metadata with testcontainers - Add testcontainers MySQL fixture for zero-config database testing - Create 24 integration tests covering: - extract_stream_types_from_device (closure handling for real @data_reader) - get_device_info and get_stream_entries - insert_stream_types with duplicate handling - insert_device_types with FK constraint recovery - Create 17 unit tests for pure functions (no DB required) - Use real @data_reader decorator from swc.aeon.schema (no mocking) - Fix extract_stream_types_from_device to check closure for original function - Add SPEC_TESTING.md documenting test architecture and patterns - Remove swc-aeon-rigs from test dependencies (not needed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: move STREAMS_MAKER spec to docs/specs/ Relocate spec from aeon/dj_pipeline/utils/STREAMS_MAKER.md to docs/specs/SPEC_STREAMS_MAKER.md for consistency with SPEC_TESTING.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: move dict_to_uuid to utils module Move dict_to_uuid function from aeon/dj_pipeline/__init__.py to aeon/dj_pipeline/utils/__init__.py for better code organization. Re-export from __init__.py for backward compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: rename load_new_metadata to load_metadata Remove "new" suffix now that legacy load_metadata.py no longer exists. Update all imports, test files, and spec documentation. Files renamed: - aeon/dj_pipeline/utils/load_new_metadata.py → load_metadata.py - tests/.../test_load_new_metadata_unit.py → test_load_metadata_unit.py - tests/.../test_load_new_metadata_integration.py → test_load_metadata_integration.py Also adds uv.lock for reproducible builds. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: clarify test classification and add sample metadata fixture - Restore corrupted spec files (SPEC_TESTING.md, SPEC_STREAMS_MAKER.md) - Clarify test tiers: unit, integration (schema + ingestion), specialized - Add golden datasets vs sample fixtures distinction - Add ForagingABC_Metadata.json as sample fixture for parsing tests - Add 17 new tests for ForagingABC metadata parsing - Update pytest markers to reflect clearer classification - Bump version to 0.2.3 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: extract experiment-agnostic core DataJoint pipeline Remove experiment-specific code to create a clean, reusable pipeline template: Removed: - aeon/schema/ - obsolete DotMap-based device schemas (replaced by Pydantic) - dj_pipeline/analysis/ - experiment-specific analysis tables - dj_pipeline/scripts/ - one-off maintenance scripts - dj_pipeline/create_experiments/ - experiment-specific setup scripts - dj_pipeline/report.py - experiment-specific reporting - utils/block_plotting.py, plotting.py, tracking_utils.py Modified: - acquisition.py: remove Environment tables (to be Pydantic devices) - tracking.py: keep only SLEAPTracking, add _get_stream_reader() helper - worker.py/process.py: remove analysis_worker - streams_maker.py: add resilience for missing DB connection Added: - utils/create_experiment.py: generic experiment creation utility Core schemas retained: lab, subject, acquisition, tracking (SLEAP), streams (via streams_maker), qc Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: upgrade streams_maker to Pydantic-based approach - Change StreamType PK from stream_type to stream_hash (UUID) to handle same stream_type with different reader implementations - Update EpochConfig.Meta to use JSON type for metadata field storing original rig_config for Pydantic reconstruction - Add get_stream_reader_for_epoch() helper for runtime reader resolution from database-stored metadata without file I/O - Update get_device_stream_template() make() method to use Pydantic approach - Remove legacy aeon_schemas references from generated streams.py - Update SPEC_STREAMS_MAKER.md with new architecture and design decisions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: update integration tests for new streams schema - Update test assertions to match new schema (stream_hash PK, no kwargs) - Fix test fixture to drop existing schema before creating tables - Update spec to match actual get_device_info() return structure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: simplify Pydantic schema loading with module:Class format - Rename get_experiment_class() to get_experiment_pydantic() - Use colon separator format (module.path:ClassName) for schema paths - Remove extract_rig_from_metadata() and extract_rig_from_metadata_dict() - Simplify get_stream_reader_for_epoch() to use Experiment.model_validate() - Remove duplicate _get_stream_reader() from tracking.py - Update DevicesSchema docstring to clarify new format - Update SPEC_STREAMS_MAKER.md with new architecture and design decisions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add stream_reader_kwargs to StreamType for reader constructor args Implement dynamic storage and retrieval of reader constructor kwargs (e.g., BitmaskEvent needs value/tag, Harp needs columns) using inspect.signature to extract parameters from reader instances. Key changes: - Add stream_reader_kwargs field to StreamType catalog table - Add _extract_kwargs_from_reader() using inspect.signature - Add get_reader_kwargs_from_device_class() for @data_reader methods - Update populate_catalog_from_pydantic() to extract and store kwargs - Update get_device_stream_template() to use stored kwargs Architecture changes (Three Decoupled Steps): - Step 1: Catalog population at worker startup (not in EpochConfig.make) - Step 2: Table creation at worker startup via streams_maker.main() - Step 3: Data population via make() calls (DML only) Test changes: - Fix fixture ordering: streams_maker.main(create_tables=False) before populate_catalog_from_pydantic() - Add golden dataset integration tests (test_full_ingestion.py) - Update test expectations for stream_reader_kwargs field Previously failing streams now work: - HarpOutputExpanderDeliverPellet (BitmaskEvent with value/tag) - HarpOutputExpanderBeamBreak (BitmaskEvent with value/tag) - WeightScaleBaselineEvent (Harp with columns) - HarpOutputExpanderRetriedDelivery (BitmaskEvent with value/tag) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update SPEC_STREAMS_MAKER to match implementation - Update _extract_kwargs_from_reader() to show inspect.signature approach instead of hardcoded attribute checks - Rename get_reader_kwargs_from_method to get_reader_kwargs_from_device_class - Clarify stream_hash uses only (stream_type, stream_reader), not kwargs - Update edge case #3 to reflect kwargs not included in hash Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: clean up load_metadata and SPEC documentation Code consolidation (~430 lines removed): - Remove extract_stream_types_from_device() - inline as list comprehension - Remove _infer_device_type_from_rig() - redundant with _extract_device_mapper_from_rig() - Remove extract_epoch_config() and ingest_epoch_metadata() - dead code - Remove unused imports (DotMap, io_api, np) - Consolidate get_device_info() device iteration into single loop Documentation cleanup: - Update module docstring to describe current architecture - Remove "Step 1/2/3" SPEC terminology from docstrings - Condense historical justifications in SPEC_STREAMS_MAKER.md - Remove "old vs new" design comparisons - Update SPEC_TESTING.md to reflect removed functions Test updates: - Rename TestExtractStreamTypesFromDevice to TestGetDataReaderMethods - Remove TestInferDeviceTypeFromRig (function removed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: remove legacy tests in favor of testcontainers-based tests Remove 4 legacy test files that required dj_local_conf.json and exp0.2 data: - test_acquisition.py (redundant with TestEpochIngestion, TestChunkIngestion) - test_pipeline_instantiation.py (low value, implicitly tested) - test_qc.py (trivial count assertion) - test_tracking.py (exp0.2-specific with unused save_test_data function) Clean up conftest.py: - Remove orphaned fixtures (_dj_config, pipeline, _experiment_creation, etc.) - Remove helper functions (load_pipeline, drop_schema, data_dir) - Remove test_params fixture with hard-coded exp0.2 values Update SPEC_TESTING.md: - Update directory structure to reflect current state - Rename test_ingestion_golden.py to test_full_ingestion.py - Remove unused slow marker -357 lines removed, test coverage maintained via new testcontainers tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add DeviceName table as primary key for ExperimentDevice tables Replace device_serial_number with device_name as the primary key for ExperimentDevice tables. This aligns with Pydantic rig configurations where device names (e.g., "CameraTop", "Feeder1") are the natural identifiers. Changes: - Add DeviceName lookup table (device_name PK, FK to DeviceType) - Update get_device_template() to use ->DeviceName as PK - Make device_serial_number an optional attribute (nullable) - Update insert_device_types() to populate DeviceName table - Update ingest_epoch_metadata_from_rig() to use device_name as key - Delete streams.py (auto-regenerated with new schema) Benefits: - Natural query keys: & {"device_name": "CameraTop"} - Devices without serial numbers (e.g., LightCycle) now supported - COM port uniqueness issue resolved (device_name is unique per experiment) - Hardware tracking preserved via optional device_serial_number attribute Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: strip timezone from chunk timestamps for naive datetime comparison io_api.load() returns UTC-aware timestamps (via pd.to_datetime(utc=True)), but epoch directory names are parsed with datetime.strptime() which returns naive timestamps. Comparing the two raises TypeError. MySQL datetime columns also reject timezone offsets (+00:00). Fix: call tz_localize(None) on chunk data immediately after loading, in both ingest_epochs and ingest_chunks. This strips the UTC offset while preserving the datetime values, keeping all downstream comparisons and DB inserts naive. * fix: handle MariaDB json-as-longtext for metadata insert and fetch MariaDB 10.3 aliases the `json` column type to `longtext`. DataJoint reads the actual MySQL column type, so attr.json is False and the automatic json.dumps()/json.loads() serialization never fires. Insert side (acquisition.py): manually json.dumps(epoch_config["metadata"]) before EpochConfig.Meta.insert1(). Placed after ingest_epoch_metadata_from_rig() which needs the dict. Fetch side (load_metadata.py): guard with isinstance(str) check and json.loads() in get_stream_reader_for_epoch() before passing metadata to Pydantic model_validate(), which expects a dict. * fix: handle stale module state and json metadata in integration tests The full_pipeline fixture now patches db_prefix and streams_maker.schema_name to match the golden test config, clears cached streams module, and re-activates pipeline schemas when other integration tests have set a different prefix. Also handle MariaDB json-as-longtext for metadata assertions in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean streams.py to catalog-only baseline for stable test imports streams.py previously contained experiment-specific dynamic tables that caused test failures due to stale schema references on import. Now the committed file has only catalog tables (StreamType, DeviceType, DeviceName, Device) while streams_maker.main() continues to append dynamic tables at runtime. Test fixtures use schema.activate() with database reset to rebind schemas across different test prefixes without needing file deletion. Also fixes ruff lint/format violations, adds save/restore of db_prefix in fixture teardown to prevent state leakage, and adds logging to teardown exception handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address PR #524 review feedback for load_metadata.py - Move `import typing` and `import re` to module-level imports instead of importing inside individual functions - Remove duplicate `import json` inside get_stream_reader_for_epoch (already imported at module level) - Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of `rig.model_fields` (instance-level access will break in Pydantic V3) - Fix ruff B007: rename unused loop variables to underscore-prefixed - Apply ruff format to normalize quote style Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address remaining PR #524 review feedback - Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__ for backward compatibility) - Fix device_sn type hint: dict[str, str | None] (values can be None) - Use setdefault for device_info dict initialization - Use list comprehension for table_attribute_entry construction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: use cls.__name__ for device type and @data_reader for device detection Replace device_type field (which gives inherited parent class names like "HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog entries. Detect devices by presence of @data_reader methods instead of device_type field, decoupling from the deprecated device_type property. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add unit test workflow for DJ pipeline PRs Runs pytest -m unit on PRs to datajoint_pipeline branch using uv. No database or golden datasets required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add integration tests to DJ pipeline CI workflow Run unit and integration tests as parallel jobs. Integration tests use testcontainers MySQL (no golden datasets). Golden-dataset tests skip automatically in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update SPEC files to match current implementation Sync both spec files with actual code after cls.__name__, @data_reader detection, and CI workflow changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen stream tests and add fetch_stream integration tests - Assert sample_count > 0 in stream data tests (not just entry existence) - Add TestFetchStream class with 5 tests: DataFrame return, video columns, harp columns, drop_pk behavior, and timestamp rounding - Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: adopt aeon_exp package path convention Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp to match the convention established in the production linear-drive branch of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0 aeon_exp_foragingABC main is now the production branch (tn/data-api-updated merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): split test extras to avoid private repo dependency in CI Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra. CI uses --extra test (no private repo access needed), local dev uses --extra test-golden for golden dataset tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): skip foragingABC-dependent tests when package not installed TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via get_experiment_pydantic(). Skip gracefully in CI where the private swc-aeon-rigs-foragingabc package is not installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pin requires-python to <3.13 for numpy compatibility numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use module level pytestmark * Remove tox.ini - config should go into pyproject.toml * Run ryuk as a privileged container * Reorganise dependencies in pyproject.toml * Combine uv setup and python installation in test_dj_pipeline workflow * Update uv.lock * Update pre-commit hook versions * Remove removed UP038 rule * Remove unused dict_to_uuid import * Remove non-existent and unused imports + ruff errors - PLR2004 Magic value - SIM102 Use a single `if` instead of nested - E501 Line too long * Update uv.lock * Move example usage to create_experiment docstring * Store reused string as variable * Fix device_sn return typehint * Fix unrecognised Pyright settings * Add default excludes to pyright config * Enable pyright for dj_pipeline, disable for tests and docs * Refactor require_golden_data fixture * Add ephys pipeline (#545) Add ephys schemas (v0.2.0). Adds spike sorting, manual curation, and cross-session unit matching capabilities to aeon mecha. * fix: remove auto-generated stream tables from streams.py Keep only the header + catalog tables (StreamType, DeviceType, DeviceName, Device). Generated stream tables (FeederEncoder, CameraVideo, etc.) are created at runtime by streams_maker.main() and should not be committed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): add dj_config_integration fixture to codec encode tests The encode tests need the mysql_container running before importing AeonStreamCodec (which triggers datajoint import/connection). Adding dj_config_integration ensures the testcontainers MySQL is ready. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rename DJ_DB_PREFIX to DJ_DATABASE_PREFIX for consistency Aligns env var name with dj.config.database.database_prefix convention. Addresses PR #537 review comment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: migrate ephys pipeline modules to DJ 2.x fetch API - fetch(as_dict=True) → to_dicts() - fetch("col") → to_arrays("col") - fetch("KEY") → keys() - fetch(format="frame") → to_pandas() - fetch(..., limit=1, as_dict=True)[0] → (query & dj.Top(limit=1)).fetch1() - fetch(..., download_path=) → dj.config.override(download_path=) context - fetch("KEY", "col1", "col2") → to_dicts() + extract keys/values Files: ephys.py, spike_sorting.py, spike_sorting_curation.py, ephys_utils.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: migrate ephys scripts to DJ 2.x fetch API Same patterns as pipeline modules, plus: - dj.config["custom"]["database.prefix"] → dj.config.database.database_prefix - fetch(..., limit=1)[0] → (query & dj.Top(limit=1)).fetch1() - fetch("col1", "col2", as_dict=True) → .proj("col1", "col2").to_dicts() Files: ephys_mock_ingestion.py, ephys_v2_qc_plots.py, ephys_v2_setup.py, ephys_v2_validate.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update spec examples to DJ 2.x fetch API Update code examples in SPEC_unit_matching.md and SPEC_STREAMS_MAKER.md to use to_dicts(), to_arrays() instead of deprecated fetch() patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean streams.py — keep only catalog tables, remove test-generated code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add note about test-generated streams.py in SPEC_TESTING Reminds developers to discard test-generated stream tables from streams.py before committing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rework `streams_maker` utils to be compatible with pydantic-based Experiment/Rig definition * docs: clean up STREAMS_MAKER.md * chore: update swc-aeon dependency to version 0.1.0 and add source configuration for uv * refactor: streamline metadata ingestion in EpochConfig by consolidating imports and updating metadata file handling * chore: update version to 0.2.2 and remove unused schema definitions in ingestion_schemas.py * test: add integration tests for load_new_metadata with testcontainers - Add testcontainers MySQL fixture for zero-config database testing - Create 24 integration tests covering: - extract_stream_types_from_device (closure handling for real @data_reader) - get_device_info and get_stream_entries - insert_stream_types with duplicate handling - insert_device_types with FK constraint recovery - Create 17 unit tests for pure functions (no DB required) - Use real @data_reader decorator from swc.aeon.schema (no mocking) - Fix extract_stream_types_from_device to check closure for original function - Add SPEC_TESTING.md documenting test architecture and patterns - Remove swc-aeon-rigs from test dependencies (not needed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: move STREAMS_MAKER spec to docs/specs/ Relocate spec from aeon/dj_pipeline/utils/STREAMS_MAKER.md to docs/specs/SPEC_STREAMS_MAKER.md for consistency with SPEC_TESTING.md. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: move dict_to_uuid to utils module Move dict_to_uuid function from aeon/dj_pipeline/__init__.py to aeon/dj_pipeline/utils/__init__.py for better code organization. Re-export from __init__.py for backward compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: rename load_new_metadata to load_metadata Remove "new" suffix now that legacy load_metadata.py no longer exists. Update all imports, test files, and spec documentation. Files renamed: - aeon/dj_pipeline/utils/load_new_metadata.py → load_metadata.py - tests/.../test_load_new_metadata_unit.py → test_load_metadata_unit.py - tests/.../test_load_new_metadata_integration.py → test_load_metadata_integration.py Also adds uv.lock for reproducible builds. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: clarify test classification and add sample metadata fixture - Restore corrupted spec files (SPEC_TESTING.md, SPEC_STREAMS_MAKER.md) - Clarify test tiers: unit, integration (schema + ingestion), specialized - Add golden datasets vs sample fixtures distinction - Add ForagingABC_Metadata.json as sample fixture for parsing tests - Add 17 new tests for ForagingABC metadata parsing - Update pytest markers to reflect clearer classification - Bump version to 0.2.3 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: extract experiment-agnostic core DataJoint pipeline Remove experiment-specific code to create a clean, reusable pipeline template: Removed: - aeon/schema/ - obsolete DotMap-based device schemas (replaced by Pydantic) - dj_pipeline/analysis/ - experiment-specific analysis tables - dj_pipeline/scripts/ - one-off maintenance scripts - dj_pipeline/create_experiments/ - experiment-specific setup scripts - dj_pipeline/report.py - experiment-specific reporting - utils/block_plotting.py, plotting.py, tracking_utils.py Modified: - acquisition.py: remove Environment tables (to be Pydantic devices) - tracking.py: keep only SLEAPTracking, add _get_stream_reader() helper - worker.py/process.py: remove analysis_worker - streams_maker.py: add resilience for missing DB connection Added: - utils/create_experiment.py: generic experiment creation utility Core schemas retained: lab, subject, acquisition, tracking (SLEAP), streams (via streams_maker), qc Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: upgrade streams_maker to Pydantic-based approach - Change StreamType PK from stream_type to stream_hash (UUID) to handle same stream_type with different reader implementations - Update EpochConfig.Meta to use JSON type for metadata field storing original rig_config for Pydantic reconstruction - Add get_stream_reader_for_epoch() helper for runtime reader resolution from database-stored metadata without file I/O - Update get_device_stream_template() make() method to use Pydantic approach - Remove legacy aeon_schemas references from generated streams.py - Update SPEC_STREAMS_MAKER.md with new architecture and design decisions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test: update integration tests for new streams schema - Update test assertions to match new schema (stream_hash PK, no kwargs) - Fix test fixture to drop existing schema before creating tables - Update spec to match actual get_device_info() return structure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: simplify Pydantic schema loading with module:Class format - Rename get_experiment_class() to get_experiment_pydantic() - Use colon separator format (module.path:ClassName) for schema paths - Remove extract_rig_from_metadata() and extract_rig_from_metadata_dict() - Simplify get_stream_reader_for_epoch() to use Experiment.model_validate() - Remove duplicate _get_stream_reader() from tracking.py - Update DevicesSchema docstring to clarify new format - Update SPEC_STREAMS_MAKER.md with new architecture and design decisions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add stream_reader_kwargs to StreamType for reader constructor args Implement dynamic storage and retrieval of reader constructor kwargs (e.g., BitmaskEvent needs value/tag, Harp needs columns) using inspect.signature to extract parameters from reader instances. Key changes: - Add stream_reader_kwargs field to StreamType catalog table - Add _extract_kwargs_from_reader() using inspect.signature - Add get_reader_kwargs_from_device_class() for @data_reader methods - Update populate_catalog_from_pydantic() to extract and store kwargs - Update get_device_stream_template() to use stored kwargs Architecture changes (Three Decoupled Steps): - Step 1: Catalog population at worker startup (not in EpochConfig.make) - Step 2: Table creation at worker startup via streams_maker.main() - Step 3: Data population via make() calls (DML only) Test changes: - Fix fixture ordering: streams_maker.main(create_tables=False) before populate_catalog_from_pydantic() - Add golden dataset integration tests (test_full_ingestion.py) - Update test expectations for stream_reader_kwargs field Previously failing streams now work: - HarpOutputExpanderDeliverPellet (BitmaskEvent with value/tag) - HarpOutputExpanderBeamBreak (BitmaskEvent with value/tag) - WeightScaleBaselineEvent (Harp with columns) - HarpOutputExpanderRetriedDelivery (BitmaskEvent with value/tag) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: update SPEC_STREAMS_MAKER to match implementation - Update _extract_kwargs_from_reader() to show inspect.signature approach instead of hardcoded attribute checks - Rename get_reader_kwargs_from_method to get_reader_kwargs_from_device_class - Clarify stream_hash uses only (stream_type, stream_reader), not kwargs - Update edge case #3 to reflect kwargs not included in hash Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: clean up load_metadata and SPEC documentation Code consolidation (~430 lines removed): - Remove extract_stream_types_from_device() - inline as list comprehension - Remove _infer_device_type_from_rig() - redundant with _extract_device_mapper_from_rig() - Remove extract_epoch_config() and ingest_epoch_metadata() - dead code - Remove unused imports (DotMap, io_api, np) - Consolidate get_device_info() device iteration into single loop Documentation cleanup: - Update module docstring to describe current architecture - Remove "Step 1/2/3" SPEC terminology from docstrings - Condense historical justifications in SPEC_STREAMS_MAKER.md - Remove "old vs new" design comparisons - Update SPEC_TESTING.md to reflect removed functions Test updates: - Rename TestExtractStreamTypesFromDevice to TestGetDataReaderMethods - Remove TestInferDeviceTypeFromRig (function removed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * refactor: remove legacy tests in favor of testcontainers-based tests Remove 4 legacy test files that required dj_local_conf.json and exp0.2 data: - test_acquisition.py (redundant with TestEpochIngestion, TestChunkIngestion) - test_pipeline_instantiation.py (low value, implicitly tested) - test_qc.py (trivial count assertion) - test_tracking.py (exp0.2-specific with unused save_test_data function) Clean up conftest.py: - Remove orphaned fixtures (_dj_config, pipeline, _experiment_creation, etc.) - Remove helper functions (load_pipeline, drop_schema, data_dir) - Remove test_params fixture with hard-coded exp0.2 values Update SPEC_TESTING.md: - Update directory structure to reflect current state - Rename test_ingestion_golden.py to test_full_ingestion.py - Remove unused slow marker -357 lines removed, test coverage maintained via new testcontainers tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat: add DeviceName table as primary key for ExperimentDevice tables Replace device_serial_number with device_name as the primary key for ExperimentDevice tables. This aligns with Pydantic rig configurations where device names (e.g., "CameraTop", "Feeder1") are the natural identifiers. Changes: - Add DeviceName lookup table (device_name PK, FK to DeviceType) - Update get_device_template() to use ->DeviceName as PK - Make device_serial_number an optional attribute (nullable) - Update insert_device_types() to populate DeviceName table - Update ingest_epoch_metadata_from_rig() to use device_name as key - Delete streams.py (auto-regenerated with new schema) Benefits: - Natural query keys: & {"device_name": "CameraTop"} - Devices without serial numbers (e.g., LightCycle) now supported - COM port uniqueness issue resolved (device_name is unique per experiment) - Hardware tracking preserved via optional device_serial_number attribute Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: strip timezone from chunk timestamps for naive datetime comparison io_api.load() returns UTC-aware timestamps (via pd.to_datetime(utc=True)), but epoch directory names are parsed with datetime.strptime() which returns naive timestamps. Comparing the two raises TypeError. MySQL datetime columns also reject timezone offsets (+00:00). Fix: call tz_localize(None) on chunk data immediately after loading, in both ingest_epochs and ingest_chunks. This strips the UTC offset while preserving the datetime values, keeping all downstream comparisons and DB inserts naive. * fix: handle MariaDB json-as-longtext for metadata insert and fetch MariaDB 10.3 aliases the `json` column type to `longtext`. DataJoint reads the actual MySQL column type, so attr.json is False and the automatic json.dumps()/json.loads() serialization never fires. Insert side (acquisition.py): manually json.dumps(epoch_config["metadata"]) before EpochConfig.Meta.insert1(). Placed after ingest_epoch_metadata_from_rig() which needs the dict. Fetch side (load_metadata.py): guard with isinstance(str) check and json.loads() in get_stream_reader_for_epoch() before passing metadata to Pydantic model_validate(), which expects a dict. * fix: handle stale module state and json metadata in integration tests The full_pipeline fixture now patches db_prefix and streams_maker.schema_name to match the golden test config, clears cached streams module, and re-activates pipeline schemas when other integration tests have set a different prefix. Also handle MariaDB json-as-longtext for metadata assertions in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: clean streams.py to catalog-only baseline for stable test imports streams.py previously contained experiment-specific dynamic tables that caused test failures due to stale schema references on import. Now the committed file has only catalog tables (StreamType, DeviceType, DeviceName, Device) while streams_maker.main() continues to append dynamic tables at runtime. Test fixtures use schema.activate() with database reset to rebind schemas across different test prefixes without needing file deletion. Also fixes ruff lint/format violations, adds save/restore of db_prefix in fixture teardown to prevent state leakage, and adds logging to teardown exception handlers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address PR #524 review feedback for load_metadata.py - Move `import typing` and `import re` to module-level imports instead of importing inside individual functions - Remove duplicate `import json` inside get_stream_reader_for_epoch (already imported at module level) - Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of `rig.model_fields` (instance-level access will break in Pydantic V3) - Fix ruff B007: rename unused loop variables to underscore-prefixed - Apply ruff format to normalize quote style Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: address remaining PR #524 review feedback - Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__ for backward compatibility) - Fix device_sn type hint: dict[str, str | None] (values can be None) - Use setdefault for device_info dict initialization - Use list comprehension for table_attribute_entry construction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: use cls.__name__ for device type and @data_reader for device detection Replace device_type field (which gives inherited parent class names like "HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog entries. Detect devices by presence of @data_reader methods instead of device_type field, decoupling from the deprecated device_type property. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add unit test workflow for DJ pipeline PRs Runs pytest -m unit on PRs to datajoint_pipeline branch using uv. No database or golden datasets required. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * ci: add integration tests to DJ pipeline CI workflow Run unit and integration tests as parallel jobs. Integration tests use testcontainers MySQL (no golden datasets). Golden-dataset tests skip automatically in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update SPEC files to match current implementation Sync both spec files with actual code after cls.__name__, @data_reader detection, and CI workflow changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: strengthen stream tests and add fetch_stream integration tests - Assert sample_count > 0 in stream data tests (not just entry existence) - Add TestFetchStream class with 5 tests: DataFrame return, video columns, harp columns, drop_pk behavior, and timestamp rounding - Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: adopt aeon_exp package path convention Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp to match the convention established in the production linear-drive branch of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0 aeon_exp_foragingABC main is now the production branch (tn/data-api-updated merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): split test extras to avoid private repo dependency in CI Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra. CI uses --extra test (no private repo access needed), local dev uses --extra test-golden for golden dataset tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): skip foragingABC-dependent tests when package not installed TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via get_experiment_pydantic(). Skip gracefully in CI where the private swc-aeon-rigs-foragingabc package is not installed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pin requires-python to <3.13 for numpy compatibility numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Use module level pytestmark * Remove tox.ini - config should go into pyproject.toml * Run ryuk as a privileged container * Reorganise dependencies in pyproject.toml * Combine uv setup and python installation in test_dj_pipeline workflow * Update uv.lock * Update pre-commit hook versions * Remove removed UP038 rule * Remove unused dict_to_uuid import * Remove non-existent and unused imports + ruff errors - PLR2004 Magic value - SIM102 Use a single `if` instead of nested - E501 Line too long * Update uv.lock * Move example usage to create_experiment docstring * Store reused string as variable * Fix device_sn return typehint * Fix unrecognised Pyright settings * Add default excludes to pyright config * Enable pyright for dj_pipeline, disable for tests and docs * Simplify directory path construction in create_experiment * Disable reportOperatorIssue in pyright config - datajoint operator "&" not supported for types * Break load_metadata/streams_maker import cycle and simplify code - replace streams_maker.schema_name with get_schema_name("streams") - minor simplifications to _extract_kwargs_from_reader, extract_active_regions, and get_device_info - suppress reportUnusedFunction error for _flatten_rig_devices, _extract_device_mapper_from_rig * Downgrade reportPossiblyUnbound to warning in pyright config * Ignore docstrings, magic number checks for ruff linting in tests * Temporarily downgrade reportImportCycles to warning in pyright config * Rename _flatten_rig_devices to flatten_rig_devices * Refactor get_data_reader_methods and get_reader_kwargs_from_device_class * Use set for device_stream_map for unique stream type and hash pairs * Bump version to 0.3.0 in pyproject.toml * Refactor dedupe logic in catalog population functions + simplify device mapper signature * Fix extract_active_regions None object-handling * Update uv.lock * cleanup: remove swc-aeon-rigs dependency, update SPEC_STREAMS_MAKER - Remove swc-aeon-rigs from uv.sources (no longer a dependency) - Update package dependency diagram: remove swc.aeon.rigs column - Fix Rig(DataSchema) → Rig(BaseSchema) in docs and examples Addresses review comments from @lochhh on PR #524. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore single-prefix test fixtures and fix DJ 2.x compatibility - Restore single TEST_DB_PREFIX approach (eliminates session-scope fixture conflict between load_metadata and golden dataset tests) - Fix _flatten_rig_devices → flatten_rig_devices rename in acquisition.py - Add semantic_check=False to key_source join in streams_maker - Fix remaining DJ 0.14 config patterns in test fixtures Some integration test failures remain due to function signature mismatches between Chang Huan's load_metadata refactors and our test fixtures — to be addressed in follow-up. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: resolve remaining DJ 2.x compatibility issues after merge - Add semantic_check=False to aggr join in load_metadata.py (previous_epoch lineage mismatch) - Fix get_device_mapper_from_rig() call signature in tests (metadata_filepath arg removed in refactor) - Revert key_source to * join (lineage check skipped when ~lineage table absent in testcontainers) All 55 integration tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use join(semantic_check=False) instead of * for key_source The * operator triggers DJ 2.x lineage checks which fail when Chunk and ExperimentDevice have epoch_start with different lineages. Using join(semantic_check=False) is production-safe — works with lineage tables present (persistent DB), not just in testcontainers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix ruff linting issues in changed files - Fix import sorting (I001) - Fix line length (E501) - Add docstrings to codec methods (D102) - Rename unused loop vars (B007) - Add D101, S110, SIM105 to test ruff ignores - Shorten table definition comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rename _flatten_rig_devices to flatten_rig_devices in unit tests Matches the public rename done in load_metadata.py refactor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): skip tests requiring private aeon_exp package in CI - Add pytest.importorskip for swc.aeon_exp in test_full_ingestion.py (from lochhh's commit 8c4c5c9) - Add skipif guard on TestPopulateCatalogFromPydantic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address review findings — DJ 2.x cleanup pass - dj.schema() → dj.Schema() in ephys.py, spike_sorting.py, spike_sorting_curation.py - Remove populate/ folder (DJ 2.x Worker not supported, was removed in original DJ 2.x upgrade but re-added via merge) - Remove aeon_ingest script entry point (references removed populate/) - Reset streams.py to catalog-only (78 lines) - Fix dj.config['custom'] and dj.config['database.*'] bracket access in ephys_v2_setup.py - Update SPEC_TESTING.md code examples to DJ 2.x API - Remove commented-out DJ 0.14 config in streams_maker.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: reset streams.py to catalog-only (test-regenerated again) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Handle non-finite values in column_stats * Handle empty index case in timestamp_stats Co-authored-by: Copilot <copilot@github.com> * Move stats functions to separate module and update imports Co-authored-by: Copilot <copilot@github.com> * Skip checks for hardcoded password and import placement in tests Co-authored-by: Copilot <copilot@github.com> * Restore original streams.py in streams_schema teardown * Update pyright version in pre-commit-config * Move codec integration tests into test/dj_pipeline/utils * Remove leftover smoke test * Use module-level pytestmark for load_metadata tests * Improve type hint in fetch_streams * Use module-level pytestmark for full ingesti…

esutlie and others added 2 commits March 12, 2026 14:59

[pre-commit.ci] auto fixes from pre-commit.com hooks

bdfbd27

for more information, see https://pre-commit.ci

alejoe91 added the container Issues related to container (docker/singularity) versions of sorters label Mar 13, 2026

alejoe91 added this to the 0.104.0 milestone Mar 13, 2026

alejoe91 approved these changes Mar 13, 2026

View reviewed changes

alejoe91 merged commit 652f856 into SpikeInterface:main Mar 16, 2026
15 checks passed

esutlie mentioned this pull request Mar 16, 2026

Pin spikeinterface to upstream (fix merged) SainsburyWellcomeCentre/aeon_mecha#540

Merged

lochhh mentioned this pull request Mar 24, 2026

Migrate pipeline metadata ingestion from dotmap device-schemas to Pydantic SainsburyWellcomeCentre/aeon_mecha#524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix container pip install for Singularity (follow-up to #4380)#4438

Fix container pip install for Singularity (follow-up to #4380)#4438
alejoe91 merged 2 commits into
SpikeInterface:mainfrom
esutlie:fix/pip-direct-url-quotes

esutlie commented Mar 12, 2026

Uh oh!

alejoe91 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

esutlie commented Mar 12, 2026

Summary

Context

Uh oh!

alejoe91 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants