Fix container pip install for Singularity (follow-up to #4380)#4438
Merged
Conversation
Pass the pip command as a list instead of a string in install_package_in_container() for github installation mode. The previous fix (PR SpikeInterface#4380) used a quoted string format which works on Docker (shlex.split strips quotes) but fails on Singularity (str.split keeps quotes attached). Passing a list bypasses both splitting mechanisms, keeping the PEP 508 Direct URL requirement as a single argument on both backends. Fixes SpikeInterface#4368
for more information, see https://pre-commit.ci
Member
|
Thank you Elissa! LGTM |
alejoe91
approved these changes
Mar 13, 2026
esutlie
added a commit
to SainsburyWellcomeCentre/aeon_mecha
that referenced
this pull request
Mar 16, 2026
The Singularity container pip install fix (esutlie/spikeinterface fix/pip-direct-url-quotes) has been merged upstream as SpikeInterface/spikeinterface#4438. Switch from fork pin to upstream main at the merge commit. Can move to a release pin (0.104.0) when it's available.
esutlie
added a commit
to SainsburyWellcomeCentre/aeon_mecha
that referenced
this pull request
Mar 16, 2026
…nterface-pin Merging this myself since it's a dependency-only change. The fork fix (Singularity container pip install) was merged upstream today as SpikeInterface/spikeinterface#4438, so this just switches the pin from my fork to the upstream merge commit. No code changes. Moving the v0.2.0 tag forward to include this.
lochhh
added a commit
to SainsburyWellcomeCentre/aeon_mecha
that referenced
this pull request
Apr 27, 2026
* fix: handle stale module state and json metadata in integration tests
The full_pipeline fixture now patches db_prefix and streams_maker.schema_name
to match the golden test config, clears cached streams module, and re-activates
pipeline schemas when other integration tests have set a different prefix.
Also handle MariaDB json-as-longtext for metadata assertions in tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clean streams.py to catalog-only baseline for stable test imports
streams.py previously contained experiment-specific dynamic tables that
caused test failures due to stale schema references on import. Now the
committed file has only catalog tables (StreamType, DeviceType, DeviceName,
Device) while streams_maker.main() continues to append dynamic tables at
runtime. Test fixtures use schema.activate() with database reset to rebind
schemas across different test prefixes without needing file deletion.
Also fixes ruff lint/format violations, adds save/restore of db_prefix in
fixture teardown to prevent state leakage, and adds logging to teardown
exception handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address PR #524 review feedback for load_metadata.py
- Move `import typing` and `import re` to module-level imports instead
of importing inside individual functions
- Remove duplicate `import json` inside get_stream_reader_for_epoch
(already imported at module level)
- Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of
`rig.model_fields` (instance-level access will break in Pydantic V3)
- Fix ruff B007: rename unused loop variables to underscore-prefixed
- Apply ruff format to normalize quote style
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address remaining PR #524 review feedback
- Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__
for backward compatibility)
- Fix device_sn type hint: dict[str, str | None] (values can be None)
- Use setdefault for device_info dict initialization
- Use list comprehension for table_attribute_entry construction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use cls.__name__ for device type and @data_reader for device detection
Replace device_type field (which gives inherited parent class names like
"HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog
entries. Detect devices by presence of @data_reader methods instead of
device_type field, decoupling from the deprecated device_type property.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add unit test workflow for DJ pipeline PRs
Runs pytest -m unit on PRs to datajoint_pipeline branch using uv.
No database or golden datasets required.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add integration tests to DJ pipeline CI workflow
Run unit and integration tests as parallel jobs. Integration tests use
testcontainers MySQL (no golden datasets). Golden-dataset tests skip
automatically in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update SPEC files to match current implementation
Sync both spec files with actual code after cls.__name__, @data_reader
detection, and CI workflow changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(ephys): revamp primary keys to subject-centric design
Add ProbeInsertion, TargetArea, InsertionTargetArea tables. Reparent
EphysChunk and EphysBlock under ProbeInsertion instead of
Experiment+Probe. Update ingest_chunks() signature to accept subject
and insertion_number. Update all scripts (launch_si_gui, save_curation,
run_aeon_spike_sorting) to use new PK fields. Pin datajoint==0.14.6.
* feat(ephys): add unit matching and chunked spike output tables
Add UnitMatchingMethod, UnitMatching, UniversalUnit (+Matched part),
and ChunkedSpikeTimes tables to spike_sorting.py. UnitMatching.make()
uses SpikeInterface's compare_two_sorters() to match units across
temporally overlapping sessions via spike time coincidence.
Update restore_raw_sorting() in spike_sorting_curation.py with 3-step
unit matching cleanup (delete UnitMatching, delete Matched rows with
force=True to bypass Part-table protection, delete orphaned
UniversalUnit rows) before deleting SortedSpikes.
* refactor(ephys): v2 PK revamp — remove subject from keys, add EphysEpoch/EphysSubject
Primary key changed from (experiment_name, subject, insertion_number) to
(experiment_name, insertion_number) throughout the ephys pipeline.
Key changes:
- ProbeInsertion: references Experiment (not Experiment.Subject), adds probe_label
- EphysEpoch (Imported): auto-discovers probes from epoch files, creates ProbeInsertion
(Probe must pre-exist — probe_type cannot be inferred from metadata)
- EphysSubject (Manual): FK to Experiment.Subject for subject association
- ingest_chunks: single-arg (experiment_name), loops ProbeInsertions, dynamic sync reader
- NeuropixelsV2 StreamGroup added for V2 hardware sync file support
- Ceph overwrite protection: FileExistsError checks in PreProcessing/SpikeSorting/PostProcessing
- All spike times as datetime64[ns] (absolute)
- Subject removed from all scripts (launch_si_gui, save_curation, run_aeon_spike_sorting)
* test(ephys): add v2 setup and validation scripts for HPC testing
ephys_v2_setup.py: 20-step test setup script (experiment creation through
spike sorting) for validating the v2 pipeline against AEONX1/social-ephys0.1.
ephys_v2_validate.py: comprehensive validation checks for PK structure,
table contents, and data integrity after running the setup script.
* docs: add unit matching design spec
Covers schema design, matching algorithm, ownership convention for
overlapping chunks, and re-processing workflows. Key design decisions:
UnitMatchingMethod as non-PK FK, Matched as Part of UnitMatching,
ChunkedSpikeTimes with unique index enforcement.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: enforce temporal ordering via key_source + make() guard
Replace soft "process in order" convention with two-level enforcement:
key_source only yields earliest unprocessed block per insertion,
make() guard rejects out-of-order direct calls. Safe under parallel
populate(reserve_jobs=True).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(spec): scope unique index per-insertion for multi-probe subjects
The unique index on ChunkedSpikeTimes was (universal_unit, chunk_start),
which would reject valid rows from different insertions that share the
same universal_unit ID. Fixed to include the full insertion scope:
(experiment_name, subject, insertion_number, universal_unit, chunk_start).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(spec): use datetime64[ns] for ChunkedSpikeTimes spike_times
Match the format used by SyncedSpikes.Unit rather than converting to
float64 epoch seconds. Conversion to epoch seconds only happens
internally during the matching algorithm for SpikeInterface compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(spec): rename Part tables and add electrode to Unit
Rename Part tables for consistency with existing pipeline pattern:
- UnitMatching.Matched → UnitMatching.Unit
- UnitMatching.ChunkedSpikeTimes → UnitMatching.Spikes
Add denormalized peak electrode FK (-> ephys.ElectrodeConfig.Electrode)
to UnitMatching.Unit for query convenience.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(spec): rename UniversalUnit to GlobalUnit, move electrode to unit roster
- Rename UniversalUnit → GlobalUnit, universal_unit → global_unit throughout
- Move electrode FK from UnitMatching.Unit to GlobalUnit (ProbeType.Electrode)
- Rewrite query patterns section around UX workflow: insertion → units → spikes
- Add algorithm step for updating electrode on matched units
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: implement unit matching system per spec
- Replace UniversalUnit with GlobalUnit (Manual, electrode on ProbeType.Electrode)
- Restructure UnitMatching: UnitMatchingMethod as non-PK FK
- Add UnitMatching.Unit Part (replaces UniversalUnit.Matched)
- Add UnitMatching.Spikes Part with ownership convention + unique index
(replaces standalone ChunkedSpikeTimes Computed)
- Enforce temporal ordering via key_source MIN(block_start) gate + make() guard
- Simplify restore_raw_sorting(): no more force=True, clean cascade
- Fix spec: master insert before parts (DataJoint requirement)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: replace 'session' terminology with 'EphysBlock'/'block'
Use the pipeline's formal 'EphysBlock' terminology consistently
across code comments, log messages, docstrings, and spec to
avoid confusion with the ambiguous 'session' term.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(spec): remove subject from PK per upstream v2 revamp
Update spec to reflect upstream ProbeInsertion PK change:
(experiment_name, subject, insertion_number) → (experiment_name, insertion_number)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(spec): restore subject to PK, redesign EphysEpoch and EphysChunk
- Restore subject to ProbeInsertion PK (surgical event model, probe-swap correctness)
- Eliminate EphysSubject table (subject now structurally required in PK)
- Redesign EphysEpoch as per-epoch probe registry with .Insertion Part table
- Move probe_label from ProbeInsertion to EphysEpoch.Insertion (epoch-level detail)
- Add EphysEpoch FK to EphysChunk for epoch traceability
- Add subject guard to ingest_chunks (skip files without subject-probe mapping)
- Update all PKs, unique indexes, query patterns, and ERD throughout spec
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(ephys): restore subject to PK, redesign EphysEpoch and ingest_chunks
Implement upstream schema changes per SPEC_unit_matching.md Section 2:
- ProbeInsertion: restore subject to PK, remove probe_label
- EphysSubject: eliminate (subject now structurally in PK)
- EphysEpoch: redesign as per-epoch probe registry with .Insertion Part
table, carry-forward semantics, auto-Probe creation
- EphysChunk: add FK to EphysEpoch for epoch traceability
- ingest_chunks(): epoch-aware file routing with subject guard
- All downstream insertion_key, key_source, unique index include subject
- PreProcessing.infer_output_dir() includes subject in path
- Update support scripts for new schema structure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* stub _read_probe_assignments as NotImplementedError placeholder
The exact file format and carry-forward logic will be determined
once experimental data conventions are finalized.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(ephys): extract helpers into utils/ephys_utils.py
Move probe discovery, metadata parsing, sync model processing,
and probe type creation out of ephys.py into a dedicated utils module.
ephys.py now contains only table definitions and orchestration logic.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(spike_sorting): add UnitMatchingParamSet for parameterizable matching
- New UnitMatchingParamSet Lookup (method + params longblob)
- UnitMatching: paramset in PK, enabling parallel matching runs
- GlobalUnit: paramset as non-PK FK (provenance, not identity)
- key_source serialized per (insertion, paramset)
- Spec updated with new table, ERD, and design rationale
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(spike_sorting): add pair-based unit matching alternative (UnitMatchingPair + PairMatching)
Implements an alternative to the sequential UnitMatching design where users
explicitly specify which two blocks to compare. Key design choices:
- UnitMatchingPair (Manual): user picks two SyncedSpikes + paramset
- PairMatching (Computed): performs matching for the specified pair
- PairMatching.Unit: keyed by GlobalUnit, records both earlier_unit and
latter_unit via nullable projected FKs to SortedSpikes.Unit
- Canonical ordering (earlier < latter) prevents duplicate pairs
- Anchor/seed logic determines which block has existing GlobalUnit assignments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(spike_sorting): replace PairMatching with seed-based bidirectional UnitMatching
Add seed_block_start to UnitMatchingParamSet so scientists can control
where matching starts (best signal quality block). Rewrite key_source
as pure SQL with seed-aware frontier logic that propagates outward in
both directions. Replace temporal ordering guard with seed-first +
overlap check. Remove PairMatching/UnitMatchingPair (rejected alternative).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: upgrade core pipeline to datajoint 2.x
- Update DJ config from dict-based to Pydantic attribute-based API
- Convert fetch patterns: fetch(as_dict=True) -> to_dicts(), fetch("col") -> to_arrays(), fetch(format="frame") -> to_pandas(), fetch("KEY") -> keys()
- Fix blob detection: type == "longblob" -> is_blob (handles DJ 2.x <blob> codec)
- Convert bare int/float types to int32/float32 in table definitions
- Replace dj.schema() with dj.Schema(), populate(limit=) with populate(max_calls=)
- Fix DJ 2.x lineage check in load_metadata aggr join (semantic_check=False)
- Use dynamic module reference in paths.py for repository_config patching
- Simplify test fixtures: set database_prefix before pipeline import so schemas activate naturally with test prefix, eliminating manual re-decoration
- Remove populate/worker.py (deferred to future work)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version to 0.4.0 for datajoint 2.x upgrade
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update uv.lock for v0.4.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update streams.py longblob -> <blob> for DJ 2.x compatibility
The auto-generated streams.py had legacy longblob type which DJ 2.x
treats as raw bytes (no serialization). Replace with <blob> to match
the streams_maker.py template.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update aeon/dj_pipeline/docs/specs/SPEC_unit_matching.md
Co-authored-by: Elissa Sutlief <elissasutlief@gmail.com>
* fix: strengthen stream data tests to assert sample_count > 0
Tests previously only checked that stream entries existed, not that they
contained actual data. All streams had sample_count=0 due to empty
reader pattern prefixes (see #536). Now tests assert sample_count > 0.
Also adds fetch_stream integration tests and updates uv.lock for
aeon_exp_foragingABC Rig fix (DataSchema + BaseSchema inheritance).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strengthen stream tests and add fetch_stream integration tests
- Assert sample_count > 0 in stream data tests (not just entry existence)
- Add TestFetchStream class with 5 tests: DataFrame return, video columns,
harp columns, drop_pk behavior, and timestamp rounding
- Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Adapt test scripts for tn/ephys_revamp_v2 and add auto-approval curation path
Setup/validate scripts updated for v2 PK structure (subject in PK),
new table names (GlobalUnit, UnitMatching.Spikes), manual ProbeInsertion
setup, single electrode group, and consolidated Phase 3 steps.
ApplyOfficialCuration.make() now handles auto-approved curations natively
(no file + parent_curation_id=-1) so the setup script uses .populate()
instead of allow_direct_insert=True.
* Update DB prefix to u_elissas_aeon_ephys_v2_test_ to match existing grant
* Insert Subject before Experiment.Subject (FK dependency)
* Fix SQL syntax error: remove quotes from TargetArea comment
MariaDB chokes on double quotes inside DJ column comments.
* Insert ProbeType manually (HPC has no internet for probeinterface)
* Skip epoch ingestion, insert Epoch manually in step 5
Ephys-only data (social-ephys0.1) has no CameraTop/FrameTop chunk
files for Epoch.ingest_epochs() to discover. Insert the Epoch entry
directly from the known epoch directory name in step 5 alongside
ProbeInsertion and EphysEpoch.
* Update SLURM scripts for v2 test: add subject to key, use uv instead of conda
* Pin spikeinterface to esutlie fork to fix Singularity pip install quoting bug
* refactor: adopt aeon_exp package path convention
Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp
to match the convention established in the production linear-drive branch
of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: adopt aeon_exp package path convention
Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp
to match the convention established in the production linear-drive branch
of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Run spike sorting for remaining 3 blocks in a single SLURM job
* Fix CurationMethod contents: each entry must be a separate tuple
* Fix UnitMatching.key_source: project to PK before union to avoid secondary attribute conflict
* Add unit matching QC plots: gantt chart, heatmap, yield, longevity, spike consistency
* Add diagnostic script to debug unit matching failures
* Add repair script to re-ingest chunks with missing SyncModel entries
* Fix repair script: delete EphysBlockInfo before EphysChunk (part table constraint)
* Fix timezone-aware datetime in SyncModel insert (MariaDB rejects +00:00 suffix)
* Fix get_matching() API: returns tuple of two Series, not a dict
* QC gantt chart: solid lines spanning blocks, thinner and more compact
* Gantt chart: give single-block units a minimum bar width
* Gantt chart: bars span full block width edge-to-edge
* Add targeted diagnostic for block 0↔1 matching failure
Replicates UnitMatching.make() logic step-by-step, printing
intermediate results. Also compares at various delta_time thresholds
to distinguish code bug from genuine lack of matching units.
* Fix: get_matching() returns Series, not dict — iterate values directly
* Add temporal offset measurement to block matching diagnostic
* Add chunk-level spike time comparison to distinguish sync vs sorting issue
* Add diagnostic script to measure blob sizes in ephys tables
Queries actual LENGTH() of longblob columns to identify candidates
for externalization to blob@dj_store.
* Add fallback to old test schema for blob size measurement
Checks elissas_aeon_ephys_test_ tables when v2 test tables
don't exist, so we can measure SortedSpikes and SyncedSpikes.
* Move large spike blobs to external storage (blob@dj_store)
Switch longblob → blob@dj_store for columns averaging >100 KB:
- SortedSpikes.Unit: spike_indices, spike_sites, spike_depths
- SyncedSpikes.Unit: spike_times
- UnitMatching.Spikes: spike_times
Data stored as files on ceph (/ceph/aeon/aeon/dj_store), DB keeps
only a 16-byte UUID reference. Reduces DB size significantly for
units with ~200K+ spikes per entry.
* Clean up scripts for PR
- Remove one-off debug/diagnostic scripts:
debug_block01_matching.py, debug_unit_matching.py,
repair_sync_models.py, measure_blob_sizes.py
- Replace hardcoded DB prefix with config-based safety check
that blocks production prefix and host
- Add single/multi block modes to run_aeon_spike_sorting.py
* Pin spikeinterface to upstream main (fix merged as PR #4438)
The Singularity container pip install fix (esutlie/spikeinterface
fix/pip-direct-url-quotes) has been merged upstream as
SpikeInterface/spikeinterface#4438. Switch from fork pin to upstream
main at the merge commit. Can move to a release pin (0.104.0) when
it's available.
* feat: implement AeonStreamCodec for lazy-loading stream data
Replace blob-based storage in auto-generated stream tables with a
codec-based approach:
- Individual data columns store JSON summary stats (min, max, mean, dtype)
- timestamps column stores JSON with time range and sampling rate
- stream_df column uses <aeon_stream> codec: stores JSON reference,
returns full DataFrame from raw files on fetch
- ~3700x storage reduction in MySQL (~1.6MB → ~0.4KB per row)
Changes:
- New: aeon/dj_pipeline/utils/codec.py (AeonStreamCodec, column_stats, timestamp_stats)
- Updated: streams_maker.py table definition and make() method
- Updated: fetch_stream() to support both codec and legacy blob tables
- Updated: test fixtures to handle streams.py regeneration
- New: TestCodecStreamData regression tests
- New: test_codec.py unit tests for helper functions
- Added: scripts/test_load_stream.py prototype/validation script
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: remove legacy blob path from fetch_stream
Full switch to codec-based stream tables — no backward compatibility
needed for blob-based tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* cleanup: remove median_dt_ms from timestamp stats
sampling_rate_hz is sufficient; median_dt_ms is redundant (inverse).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0
aeon_exp_foragingABC main is now the production branch (tn/data-api-updated
merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): split test extras to avoid private repo dependency in CI
Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra.
CI uses --extra test (no private repo access needed), local dev uses
--extra test-golden for golden dataset tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): skip foragingABC-dependent tests when package not installed
TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via
get_experiment_pydantic(). Skip gracefully in CI where the private
swc-aeon-rigs-foragingabc package is not installed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: pin requires-python to <3.13 for numpy compatibility
numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Temporarily allow PRs to all branches to trigger test_dj_pipeline workflow
* Migrate pytest.ini to pyproject.toml and ignore specific warnings
* Resolve deprecated positional inserts
* Skip full ingestion tests if aeon_exp package is unavailable
* Pin spikeinterface>= .104.0
* Collect all docs in root level docs dir
* Move and rename spike sorting curation specifications
* Fix typo in aeon/dj_pipeline/ephys.py
Co-authored-by: Elissa Sutlief <elissasutlief@gmail.com>
* feat: rework `streams_maker` utils to be compatible with pydantic-based Experiment/Rig definition
* docs: clean up STREAMS_MAKER.md
* chore: update swc-aeon dependency to version 0.1.0 and add source configuration for uv
* refactor: streamline metadata ingestion in EpochConfig by consolidating imports and updating metadata file handling
* chore: update version to 0.2.2 and remove unused schema definitions in ingestion_schemas.py
* test: add integration tests for load_new_metadata with testcontainers
- Add testcontainers MySQL fixture for zero-config database testing
- Create 24 integration tests covering:
- extract_stream_types_from_device (closure handling for real @data_reader)
- get_device_info and get_stream_entries
- insert_stream_types with duplicate handling
- insert_device_types with FK constraint recovery
- Create 17 unit tests for pure functions (no DB required)
- Use real @data_reader decorator from swc.aeon.schema (no mocking)
- Fix extract_stream_types_from_device to check closure for original function
- Add SPEC_TESTING.md documenting test architecture and patterns
- Remove swc-aeon-rigs from test dependencies (not needed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: move STREAMS_MAKER spec to docs/specs/
Relocate spec from aeon/dj_pipeline/utils/STREAMS_MAKER.md to
docs/specs/SPEC_STREAMS_MAKER.md for consistency with SPEC_TESTING.md.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: move dict_to_uuid to utils module
Move dict_to_uuid function from aeon/dj_pipeline/__init__.py to
aeon/dj_pipeline/utils/__init__.py for better code organization.
Re-export from __init__.py for backward compatibility.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: rename load_new_metadata to load_metadata
Remove "new" suffix now that legacy load_metadata.py no longer exists.
Update all imports, test files, and spec documentation.
Files renamed:
- aeon/dj_pipeline/utils/load_new_metadata.py → load_metadata.py
- tests/.../test_load_new_metadata_unit.py → test_load_metadata_unit.py
- tests/.../test_load_new_metadata_integration.py → test_load_metadata_integration.py
Also adds uv.lock for reproducible builds.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: clarify test classification and add sample metadata fixture
- Restore corrupted spec files (SPEC_TESTING.md, SPEC_STREAMS_MAKER.md)
- Clarify test tiers: unit, integration (schema + ingestion), specialized
- Add golden datasets vs sample fixtures distinction
- Add ForagingABC_Metadata.json as sample fixture for parsing tests
- Add 17 new tests for ForagingABC metadata parsing
- Update pytest markers to reflect clearer classification
- Bump version to 0.2.3
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: extract experiment-agnostic core DataJoint pipeline
Remove experiment-specific code to create a clean, reusable pipeline template:
Removed:
- aeon/schema/ - obsolete DotMap-based device schemas (replaced by Pydantic)
- dj_pipeline/analysis/ - experiment-specific analysis tables
- dj_pipeline/scripts/ - one-off maintenance scripts
- dj_pipeline/create_experiments/ - experiment-specific setup scripts
- dj_pipeline/report.py - experiment-specific reporting
- utils/block_plotting.py, plotting.py, tracking_utils.py
Modified:
- acquisition.py: remove Environment tables (to be Pydantic devices)
- tracking.py: keep only SLEAPTracking, add _get_stream_reader() helper
- worker.py/process.py: remove analysis_worker
- streams_maker.py: add resilience for missing DB connection
Added:
- utils/create_experiment.py: generic experiment creation utility
Core schemas retained: lab, subject, acquisition, tracking (SLEAP),
streams (via streams_maker), qc
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: upgrade streams_maker to Pydantic-based approach
- Change StreamType PK from stream_type to stream_hash (UUID)
to handle same stream_type with different reader implementations
- Update EpochConfig.Meta to use JSON type for metadata field
storing original rig_config for Pydantic reconstruction
- Add get_stream_reader_for_epoch() helper for runtime reader resolution
from database-stored metadata without file I/O
- Update get_device_stream_template() make() method to use Pydantic approach
- Remove legacy aeon_schemas references from generated streams.py
- Update SPEC_STREAMS_MAKER.md with new architecture and design decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test: update integration tests for new streams schema
- Update test assertions to match new schema (stream_hash PK, no kwargs)
- Fix test fixture to drop existing schema before creating tables
- Update spec to match actual get_device_info() return structure
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: simplify Pydantic schema loading with module:Class format
- Rename get_experiment_class() to get_experiment_pydantic()
- Use colon separator format (module.path:ClassName) for schema paths
- Remove extract_rig_from_metadata() and extract_rig_from_metadata_dict()
- Simplify get_stream_reader_for_epoch() to use Experiment.model_validate()
- Remove duplicate _get_stream_reader() from tracking.py
- Update DevicesSchema docstring to clarify new format
- Update SPEC_STREAMS_MAKER.md with new architecture and design decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add stream_reader_kwargs to StreamType for reader constructor args
Implement dynamic storage and retrieval of reader constructor kwargs
(e.g., BitmaskEvent needs value/tag, Harp needs columns) using
inspect.signature to extract parameters from reader instances.
Key changes:
- Add stream_reader_kwargs field to StreamType catalog table
- Add _extract_kwargs_from_reader() using inspect.signature
- Add get_reader_kwargs_from_device_class() for @data_reader methods
- Update populate_catalog_from_pydantic() to extract and store kwargs
- Update get_device_stream_template() to use stored kwargs
Architecture changes (Three Decoupled Steps):
- Step 1: Catalog population at worker startup (not in EpochConfig.make)
- Step 2: Table creation at worker startup via streams_maker.main()
- Step 3: Data population via make() calls (DML only)
Test changes:
- Fix fixture ordering: streams_maker.main(create_tables=False) before
populate_catalog_from_pydantic()
- Add golden dataset integration tests (test_full_ingestion.py)
- Update test expectations for stream_reader_kwargs field
Previously failing streams now work:
- HarpOutputExpanderDeliverPellet (BitmaskEvent with value/tag)
- HarpOutputExpanderBeamBreak (BitmaskEvent with value/tag)
- WeightScaleBaselineEvent (Harp with columns)
- HarpOutputExpanderRetriedDelivery (BitmaskEvent with value/tag)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: update SPEC_STREAMS_MAKER to match implementation
- Update _extract_kwargs_from_reader() to show inspect.signature approach
instead of hardcoded attribute checks
- Rename get_reader_kwargs_from_method to get_reader_kwargs_from_device_class
- Clarify stream_hash uses only (stream_type, stream_reader), not kwargs
- Update edge case #3 to reflect kwargs not included in hash
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: clean up load_metadata and SPEC documentation
Code consolidation (~430 lines removed):
- Remove extract_stream_types_from_device() - inline as list comprehension
- Remove _infer_device_type_from_rig() - redundant with _extract_device_mapper_from_rig()
- Remove extract_epoch_config() and ingest_epoch_metadata() - dead code
- Remove unused imports (DotMap, io_api, np)
- Consolidate get_device_info() device iteration into single loop
Documentation cleanup:
- Update module docstring to describe current architecture
- Remove "Step 1/2/3" SPEC terminology from docstrings
- Condense historical justifications in SPEC_STREAMS_MAKER.md
- Remove "old vs new" design comparisons
- Update SPEC_TESTING.md to reflect removed functions
Test updates:
- Rename TestExtractStreamTypesFromDevice to TestGetDataReaderMethods
- Remove TestInferDeviceTypeFromRig (function removed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: remove legacy tests in favor of testcontainers-based tests
Remove 4 legacy test files that required dj_local_conf.json and exp0.2 data:
- test_acquisition.py (redundant with TestEpochIngestion, TestChunkIngestion)
- test_pipeline_instantiation.py (low value, implicitly tested)
- test_qc.py (trivial count assertion)
- test_tracking.py (exp0.2-specific with unused save_test_data function)
Clean up conftest.py:
- Remove orphaned fixtures (_dj_config, pipeline, _experiment_creation, etc.)
- Remove helper functions (load_pipeline, drop_schema, data_dir)
- Remove test_params fixture with hard-coded exp0.2 values
Update SPEC_TESTING.md:
- Update directory structure to reflect current state
- Rename test_ingestion_golden.py to test_full_ingestion.py
- Remove unused slow marker
-357 lines removed, test coverage maintained via new testcontainers tests.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add DeviceName table as primary key for ExperimentDevice tables
Replace device_serial_number with device_name as the primary key for
ExperimentDevice tables. This aligns with Pydantic rig configurations
where device names (e.g., "CameraTop", "Feeder1") are the natural identifiers.
Changes:
- Add DeviceName lookup table (device_name PK, FK to DeviceType)
- Update get_device_template() to use ->DeviceName as PK
- Make device_serial_number an optional attribute (nullable)
- Update insert_device_types() to populate DeviceName table
- Update ingest_epoch_metadata_from_rig() to use device_name as key
- Delete streams.py (auto-regenerated with new schema)
Benefits:
- Natural query keys: & {"device_name": "CameraTop"}
- Devices without serial numbers (e.g., LightCycle) now supported
- COM port uniqueness issue resolved (device_name is unique per experiment)
- Hardware tracking preserved via optional device_serial_number attribute
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: strip timezone from chunk timestamps for naive datetime comparison
io_api.load() returns UTC-aware timestamps (via pd.to_datetime(utc=True)),
but epoch directory names are parsed with datetime.strptime() which returns
naive timestamps. Comparing the two raises TypeError. MySQL datetime columns
also reject timezone offsets (+00:00).
Fix: call tz_localize(None) on chunk data immediately after loading, in both
ingest_epochs and ingest_chunks. This strips the UTC offset while preserving
the datetime values, keeping all downstream comparisons and DB inserts naive.
* fix: handle MariaDB json-as-longtext for metadata insert and fetch
MariaDB 10.3 aliases the `json` column type to `longtext`. DataJoint
reads the actual MySQL column type, so attr.json is False and the
automatic json.dumps()/json.loads() serialization never fires.
Insert side (acquisition.py): manually json.dumps(epoch_config["metadata"])
before EpochConfig.Meta.insert1(). Placed after ingest_epoch_metadata_from_rig()
which needs the dict.
Fetch side (load_metadata.py): guard with isinstance(str) check and
json.loads() in get_stream_reader_for_epoch() before passing metadata
to Pydantic model_validate(), which expects a dict.
* fix: handle stale module state and json metadata in integration tests
The full_pipeline fixture now patches db_prefix and streams_maker.schema_name
to match the golden test config, clears cached streams module, and re-activates
pipeline schemas when other integration tests have set a different prefix.
Also handle MariaDB json-as-longtext for metadata assertions in tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clean streams.py to catalog-only baseline for stable test imports
streams.py previously contained experiment-specific dynamic tables that
caused test failures due to stale schema references on import. Now the
committed file has only catalog tables (StreamType, DeviceType, DeviceName,
Device) while streams_maker.main() continues to append dynamic tables at
runtime. Test fixtures use schema.activate() with database reset to rebind
schemas across different test prefixes without needing file deletion.
Also fixes ruff lint/format violations, adds save/restore of db_prefix in
fixture teardown to prevent state leakage, and adds logging to teardown
exception handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address PR #524 review feedback for load_metadata.py
- Move `import typing` and `import re` to module-level imports instead
of importing inside individual functions
- Remove duplicate `import json` inside get_stream_reader_for_epoch
(already imported at module level)
- Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of
`rig.model_fields` (instance-level access will break in Pydantic V3)
- Fix ruff B007: rename unused loop variables to underscore-prefixed
- Apply ruff format to normalize quote style
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address remaining PR #524 review feedback
- Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__
for backward compatibility)
- Fix device_sn type hint: dict[str, str | None] (values can be None)
- Use setdefault for device_info dict initialization
- Use list comprehension for table_attribute_entry construction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use cls.__name__ for device type and @data_reader for device detection
Replace device_type field (which gives inherited parent class names like
"HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog
entries. Detect devices by presence of @data_reader methods instead of
device_type field, decoupling from the deprecated device_type property.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add unit test workflow for DJ pipeline PRs
Runs pytest -m unit on PRs to datajoint_pipeline branch using uv.
No database or golden datasets required.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add integration tests to DJ pipeline CI workflow
Run unit and integration tests as parallel jobs. Integration tests use
testcontainers MySQL (no golden datasets). Golden-dataset tests skip
automatically in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update SPEC files to match current implementation
Sync both spec files with actual code after cls.__name__, @data_reader
detection, and CI workflow changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strengthen stream tests and add fetch_stream integration tests
- Assert sample_count > 0 in stream data tests (not just entry existence)
- Add TestFetchStream class with 5 tests: DataFrame return, video columns,
harp columns, drop_pk behavior, and timestamp rounding
- Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: adopt aeon_exp package path convention
Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp
to match the convention established in the production linear-drive branch
of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0
aeon_exp_foragingABC main is now the production branch (tn/data-api-updated
merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): split test extras to avoid private repo dependency in CI
Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra.
CI uses --extra test (no private repo access needed), local dev uses
--extra test-golden for golden dataset tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): skip foragingABC-dependent tests when package not installed
TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via
get_experiment_pydantic(). Skip gracefully in CI where the private
swc-aeon-rigs-foragingabc package is not installed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: pin requires-python to <3.13 for numpy compatibility
numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use module level pytestmark
* Remove tox.ini - config should go into pyproject.toml
* Run ryuk as a privileged container
* Reorganise dependencies in pyproject.toml
* Combine uv setup and python installation in test_dj_pipeline workflow
* Update uv.lock
* Update pre-commit hook versions
* Remove removed UP038 rule
* Remove unused dict_to_uuid import
* Remove non-existent and unused imports + ruff errors
- PLR2004 Magic value
- SIM102 Use a single `if` instead of nested
- E501 Line too long
* Update uv.lock
* Move example usage to create_experiment docstring
* Store reused string as variable
* Fix device_sn return typehint
* Fix unrecognised Pyright settings
* Add default excludes to pyright config
* Enable pyright for dj_pipeline, disable for tests and docs
* Refactor require_golden_data fixture
* Add ephys pipeline (#545)
Add ephys schemas (v0.2.0).
Adds spike sorting, manual curation, and cross-session unit matching capabilities to aeon mecha.
* fix: remove auto-generated stream tables from streams.py
Keep only the header + catalog tables (StreamType, DeviceType,
DeviceName, Device). Generated stream tables (FeederEncoder,
CameraVideo, etc.) are created at runtime by streams_maker.main()
and should not be committed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): add dj_config_integration fixture to codec encode tests
The encode tests need the mysql_container running before importing
AeonStreamCodec (which triggers datajoint import/connection). Adding
dj_config_integration ensures the testcontainers MySQL is ready.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rename DJ_DB_PREFIX to DJ_DATABASE_PREFIX for consistency
Aligns env var name with dj.config.database.database_prefix convention.
Addresses PR #537 review comment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: migrate ephys pipeline modules to DJ 2.x fetch API
- fetch(as_dict=True) → to_dicts()
- fetch("col") → to_arrays("col")
- fetch("KEY") → keys()
- fetch(format="frame") → to_pandas()
- fetch(..., limit=1, as_dict=True)[0] → (query & dj.Top(limit=1)).fetch1()
- fetch(..., download_path=) → dj.config.override(download_path=) context
- fetch("KEY", "col1", "col2") → to_dicts() + extract keys/values
Files: ephys.py, spike_sorting.py, spike_sorting_curation.py, ephys_utils.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: migrate ephys scripts to DJ 2.x fetch API
Same patterns as pipeline modules, plus:
- dj.config["custom"]["database.prefix"] → dj.config.database.database_prefix
- fetch(..., limit=1)[0] → (query & dj.Top(limit=1)).fetch1()
- fetch("col1", "col2", as_dict=True) → .proj("col1", "col2").to_dicts()
Files: ephys_mock_ingestion.py, ephys_v2_qc_plots.py, ephys_v2_setup.py,
ephys_v2_validate.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update spec examples to DJ 2.x fetch API
Update code examples in SPEC_unit_matching.md and SPEC_STREAMS_MAKER.md
to use to_dicts(), to_arrays() instead of deprecated fetch() patterns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clean streams.py — keep only catalog tables, remove test-generated code
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add note about test-generated streams.py in SPEC_TESTING
Reminds developers to discard test-generated stream tables from
streams.py before committing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: rework `streams_maker` utils to be compatible with pydantic-based Experiment/Rig definition
* docs: clean up STREAMS_MAKER.md
* chore: update swc-aeon dependency to version 0.1.0 and add source configuration for uv
* refactor: streamline metadata ingestion in EpochConfig by consolidating imports and updating metadata file handling
* chore: update version to 0.2.2 and remove unused schema definitions in ingestion_schemas.py
* test: add integration tests for load_new_metadata with testcontainers
- Add testcontainers MySQL fixture for zero-config database testing
- Create 24 integration tests covering:
- extract_stream_types_from_device (closure handling for real @data_reader)
- get_device_info and get_stream_entries
- insert_stream_types with duplicate handling
- insert_device_types with FK constraint recovery
- Create 17 unit tests for pure functions (no DB required)
- Use real @data_reader decorator from swc.aeon.schema (no mocking)
- Fix extract_stream_types_from_device to check closure for original function
- Add SPEC_TESTING.md documenting test architecture and patterns
- Remove swc-aeon-rigs from test dependencies (not needed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: move STREAMS_MAKER spec to docs/specs/
Relocate spec from aeon/dj_pipeline/utils/STREAMS_MAKER.md to
docs/specs/SPEC_STREAMS_MAKER.md for consistency with SPEC_TESTING.md.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: move dict_to_uuid to utils module
Move dict_to_uuid function from aeon/dj_pipeline/__init__.py to
aeon/dj_pipeline/utils/__init__.py for better code organization.
Re-export from __init__.py for backward compatibility.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: rename load_new_metadata to load_metadata
Remove "new" suffix now that legacy load_metadata.py no longer exists.
Update all imports, test files, and spec documentation.
Files renamed:
- aeon/dj_pipeline/utils/load_new_metadata.py → load_metadata.py
- tests/.../test_load_new_metadata_unit.py → test_load_metadata_unit.py
- tests/.../test_load_new_metadata_integration.py → test_load_metadata_integration.py
Also adds uv.lock for reproducible builds.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: clarify test classification and add sample metadata fixture
- Restore corrupted spec files (SPEC_TESTING.md, SPEC_STREAMS_MAKER.md)
- Clarify test tiers: unit, integration (schema + ingestion), specialized
- Add golden datasets vs sample fixtures distinction
- Add ForagingABC_Metadata.json as sample fixture for parsing tests
- Add 17 new tests for ForagingABC metadata parsing
- Update pytest markers to reflect clearer classification
- Bump version to 0.2.3
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: extract experiment-agnostic core DataJoint pipeline
Remove experiment-specific code to create a clean, reusable pipeline template:
Removed:
- aeon/schema/ - obsolete DotMap-based device schemas (replaced by Pydantic)
- dj_pipeline/analysis/ - experiment-specific analysis tables
- dj_pipeline/scripts/ - one-off maintenance scripts
- dj_pipeline/create_experiments/ - experiment-specific setup scripts
- dj_pipeline/report.py - experiment-specific reporting
- utils/block_plotting.py, plotting.py, tracking_utils.py
Modified:
- acquisition.py: remove Environment tables (to be Pydantic devices)
- tracking.py: keep only SLEAPTracking, add _get_stream_reader() helper
- worker.py/process.py: remove analysis_worker
- streams_maker.py: add resilience for missing DB connection
Added:
- utils/create_experiment.py: generic experiment creation utility
Core schemas retained: lab, subject, acquisition, tracking (SLEAP),
streams (via streams_maker), qc
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: upgrade streams_maker to Pydantic-based approach
- Change StreamType PK from stream_type to stream_hash (UUID)
to handle same stream_type with different reader implementations
- Update EpochConfig.Meta to use JSON type for metadata field
storing original rig_config for Pydantic reconstruction
- Add get_stream_reader_for_epoch() helper for runtime reader resolution
from database-stored metadata without file I/O
- Update get_device_stream_template() make() method to use Pydantic approach
- Remove legacy aeon_schemas references from generated streams.py
- Update SPEC_STREAMS_MAKER.md with new architecture and design decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test: update integration tests for new streams schema
- Update test assertions to match new schema (stream_hash PK, no kwargs)
- Fix test fixture to drop existing schema before creating tables
- Update spec to match actual get_device_info() return structure
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: simplify Pydantic schema loading with module:Class format
- Rename get_experiment_class() to get_experiment_pydantic()
- Use colon separator format (module.path:ClassName) for schema paths
- Remove extract_rig_from_metadata() and extract_rig_from_metadata_dict()
- Simplify get_stream_reader_for_epoch() to use Experiment.model_validate()
- Remove duplicate _get_stream_reader() from tracking.py
- Update DevicesSchema docstring to clarify new format
- Update SPEC_STREAMS_MAKER.md with new architecture and design decisions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add stream_reader_kwargs to StreamType for reader constructor args
Implement dynamic storage and retrieval of reader constructor kwargs
(e.g., BitmaskEvent needs value/tag, Harp needs columns) using
inspect.signature to extract parameters from reader instances.
Key changes:
- Add stream_reader_kwargs field to StreamType catalog table
- Add _extract_kwargs_from_reader() using inspect.signature
- Add get_reader_kwargs_from_device_class() for @data_reader methods
- Update populate_catalog_from_pydantic() to extract and store kwargs
- Update get_device_stream_template() to use stored kwargs
Architecture changes (Three Decoupled Steps):
- Step 1: Catalog population at worker startup (not in EpochConfig.make)
- Step 2: Table creation at worker startup via streams_maker.main()
- Step 3: Data population via make() calls (DML only)
Test changes:
- Fix fixture ordering: streams_maker.main(create_tables=False) before
populate_catalog_from_pydantic()
- Add golden dataset integration tests (test_full_ingestion.py)
- Update test expectations for stream_reader_kwargs field
Previously failing streams now work:
- HarpOutputExpanderDeliverPellet (BitmaskEvent with value/tag)
- HarpOutputExpanderBeamBreak (BitmaskEvent with value/tag)
- WeightScaleBaselineEvent (Harp with columns)
- HarpOutputExpanderRetriedDelivery (BitmaskEvent with value/tag)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* docs: update SPEC_STREAMS_MAKER to match implementation
- Update _extract_kwargs_from_reader() to show inspect.signature approach
instead of hardcoded attribute checks
- Rename get_reader_kwargs_from_method to get_reader_kwargs_from_device_class
- Clarify stream_hash uses only (stream_type, stream_reader), not kwargs
- Update edge case #3 to reflect kwargs not included in hash
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: clean up load_metadata and SPEC documentation
Code consolidation (~430 lines removed):
- Remove extract_stream_types_from_device() - inline as list comprehension
- Remove _infer_device_type_from_rig() - redundant with _extract_device_mapper_from_rig()
- Remove extract_epoch_config() and ingest_epoch_metadata() - dead code
- Remove unused imports (DotMap, io_api, np)
- Consolidate get_device_info() device iteration into single loop
Documentation cleanup:
- Update module docstring to describe current architecture
- Remove "Step 1/2/3" SPEC terminology from docstrings
- Condense historical justifications in SPEC_STREAMS_MAKER.md
- Remove "old vs new" design comparisons
- Update SPEC_TESTING.md to reflect removed functions
Test updates:
- Rename TestExtractStreamTypesFromDevice to TestGetDataReaderMethods
- Remove TestInferDeviceTypeFromRig (function removed)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor: remove legacy tests in favor of testcontainers-based tests
Remove 4 legacy test files that required dj_local_conf.json and exp0.2 data:
- test_acquisition.py (redundant with TestEpochIngestion, TestChunkIngestion)
- test_pipeline_instantiation.py (low value, implicitly tested)
- test_qc.py (trivial count assertion)
- test_tracking.py (exp0.2-specific with unused save_test_data function)
Clean up conftest.py:
- Remove orphaned fixtures (_dj_config, pipeline, _experiment_creation, etc.)
- Remove helper functions (load_pipeline, drop_schema, data_dir)
- Remove test_params fixture with hard-coded exp0.2 values
Update SPEC_TESTING.md:
- Update directory structure to reflect current state
- Rename test_ingestion_golden.py to test_full_ingestion.py
- Remove unused slow marker
-357 lines removed, test coverage maintained via new testcontainers tests.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* feat: add DeviceName table as primary key for ExperimentDevice tables
Replace device_serial_number with device_name as the primary key for
ExperimentDevice tables. This aligns with Pydantic rig configurations
where device names (e.g., "CameraTop", "Feeder1") are the natural identifiers.
Changes:
- Add DeviceName lookup table (device_name PK, FK to DeviceType)
- Update get_device_template() to use ->DeviceName as PK
- Make device_serial_number an optional attribute (nullable)
- Update insert_device_types() to populate DeviceName table
- Update ingest_epoch_metadata_from_rig() to use device_name as key
- Delete streams.py (auto-regenerated with new schema)
Benefits:
- Natural query keys: & {"device_name": "CameraTop"}
- Devices without serial numbers (e.g., LightCycle) now supported
- COM port uniqueness issue resolved (device_name is unique per experiment)
- Hardware tracking preserved via optional device_serial_number attribute
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: strip timezone from chunk timestamps for naive datetime comparison
io_api.load() returns UTC-aware timestamps (via pd.to_datetime(utc=True)),
but epoch directory names are parsed with datetime.strptime() which returns
naive timestamps. Comparing the two raises TypeError. MySQL datetime columns
also reject timezone offsets (+00:00).
Fix: call tz_localize(None) on chunk data immediately after loading, in both
ingest_epochs and ingest_chunks. This strips the UTC offset while preserving
the datetime values, keeping all downstream comparisons and DB inserts naive.
* fix: handle MariaDB json-as-longtext for metadata insert and fetch
MariaDB 10.3 aliases the `json` column type to `longtext`. DataJoint
reads the actual MySQL column type, so attr.json is False and the
automatic json.dumps()/json.loads() serialization never fires.
Insert side (acquisition.py): manually json.dumps(epoch_config["metadata"])
before EpochConfig.Meta.insert1(). Placed after ingest_epoch_metadata_from_rig()
which needs the dict.
Fetch side (load_metadata.py): guard with isinstance(str) check and
json.loads() in get_stream_reader_for_epoch() before passing metadata
to Pydantic model_validate(), which expects a dict.
* fix: handle stale module state and json metadata in integration tests
The full_pipeline fixture now patches db_prefix and streams_maker.schema_name
to match the golden test config, clears cached streams module, and re-activates
pipeline schemas when other integration tests have set a different prefix.
Also handle MariaDB json-as-longtext for metadata assertions in tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clean streams.py to catalog-only baseline for stable test imports
streams.py previously contained experiment-specific dynamic tables that
caused test failures due to stale schema references on import. Now the
committed file has only catalog tables (StreamType, DeviceType, DeviceName,
Device) while streams_maker.main() continues to append dynamic tables at
runtime. Test fixtures use schema.activate() with database reset to rebind
schemas across different test prefixes without needing file deletion.
Also fixes ruff lint/format violations, adds save/restore of db_prefix in
fixture teardown to prevent state leakage, and adds logging to teardown
exception handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address PR #524 review feedback for load_metadata.py
- Move `import typing` and `import re` to module-level imports instead
of importing inside individual functions
- Remove duplicate `import json` inside get_stream_reader_for_epoch
(already imported at module level)
- Fix Pydantic V3 deprecation: use `type(rig).model_fields` instead of
`rig.model_fields` (instance-level access will break in Pydantic V3)
- Fix ruff B007: rename unused loop variables to underscore-prefixed
- Apply ruff format to normalize quote style
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: address remaining PR #524 review feedback
- Move dict_to_uuid to utils/hashing.py (re-exported from utils/__init__
for backward compatibility)
- Fix device_sn type hint: dict[str, str | None] (values can be None)
- Use setdefault for device_info dict initialization
- Use list comprehension for table_attribute_entry construction
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use cls.__name__ for device type and @data_reader for device detection
Replace device_type field (which gives inherited parent class names like
"HarpOutputExpander" for Feeder) with cls.__name__ for DeviceType catalog
entries. Detect devices by presence of @data_reader methods instead of
device_type field, decoupling from the deprecated device_type property.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add unit test workflow for DJ pipeline PRs
Runs pytest -m unit on PRs to datajoint_pipeline branch using uv.
No database or golden datasets required.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: add integration tests to DJ pipeline CI workflow
Run unit and integration tests as parallel jobs. Integration tests use
testcontainers MySQL (no golden datasets). Golden-dataset tests skip
automatically in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update SPEC files to match current implementation
Sync both spec files with actual code after cls.__name__, @data_reader
detection, and CI workflow changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strengthen stream tests and add fetch_stream integration tests
- Assert sample_count > 0 in stream data tests (not just entry existence)
- Add TestFetchStream class with 5 tests: DataFrame return, video columns,
harp columns, drop_pk behavior, and timestamp rounding
- Update uv.lock for aeon_exp_foragingABC Rig fix (BaseSchema inheritance)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: adopt aeon_exp package path convention
Update devices_schema import strings from swc.aeon.exp to swc.aeon_exp
to match the convention established in the production linear-drive branch
of aeon_exp_foragingABC. Update uv.lock to latest tn/data-api-updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: update aeon_exp_foragingABC to main branch, swc-aeon to v0.2.0
aeon_exp_foragingABC main is now the production branch (tn/data-api-updated
merged). swc-aeon updated to v0.2.0 to satisfy new dependency requirement.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): split test extras to avoid private repo dependency in CI
Move swc-aeon-rigs-foragingabc from [test] to [test-golden] extra.
CI uses --extra test (no private repo access needed), local dev uses
--extra test-golden for golden dataset tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): skip foragingABC-dependent tests when package not installed
TestPopulateCatalogFromPydantic imports swc.aeon_exp at runtime via
get_experiment_pydantic(). Skip gracefully in CI where the private
swc-aeon-rigs-foragingabc package is not installed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: pin requires-python to <3.13 for numpy compatibility
numpy<2 from aeon_exp_foragingABC is unsolvable for Python 3.13+.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Use module level pytestmark
* Remove tox.ini - config should go into pyproject.toml
* Run ryuk as a privileged container
* Reorganise dependencies in pyproject.toml
* Combine uv setup and python installation in test_dj_pipeline workflow
* Update uv.lock
* Update pre-commit hook versions
* Remove removed UP038 rule
* Remove unused dict_to_uuid import
* Remove non-existent and unused imports + ruff errors
- PLR2004 Magic value
- SIM102 Use a single `if` instead of nested
- E501 Line too long
* Update uv.lock
* Move example usage to create_experiment docstring
* Store reused string as variable
* Fix device_sn return typehint
* Fix unrecognised Pyright settings
* Add default excludes to pyright config
* Enable pyright for dj_pipeline, disable for tests and docs
* Simplify directory path construction in create_experiment
* Disable reportOperatorIssue in pyright config
- datajoint operator "&" not supported for types
* Break load_metadata/streams_maker import cycle and simplify code
- replace streams_maker.schema_name with get_schema_name("streams")
- minor simplifications to _extract_kwargs_from_reader, extract_active_regions, and get_device_info
- suppress reportUnusedFunction error for _flatten_rig_devices, _extract_device_mapper_from_rig
* Downgrade reportPossiblyUnbound to warning in pyright config
* Ignore docstrings, magic number checks for ruff linting in tests
* Temporarily downgrade reportImportCycles to warning in pyright config
* Rename _flatten_rig_devices to flatten_rig_devices
* Refactor get_data_reader_methods and get_reader_kwargs_from_device_class
* Use set for device_stream_map for unique stream type and hash pairs
* Bump version to 0.3.0 in pyproject.toml
* Refactor dedupe logic in catalog population functions + simplify device mapper signature
* Fix extract_active_regions None object-handling
* Update uv.lock
* cleanup: remove swc-aeon-rigs dependency, update SPEC_STREAMS_MAKER
- Remove swc-aeon-rigs from uv.sources (no longer a dependency)
- Update package dependency diagram: remove swc.aeon.rigs column
- Fix Rig(DataSchema) → Rig(BaseSchema) in docs and examples
Addresses review comments from @lochhh on PR #524.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: restore single-prefix test fixtures and fix DJ 2.x compatibility
- Restore single TEST_DB_PREFIX approach (eliminates session-scope fixture
conflict between load_metadata and golden dataset tests)
- Fix _flatten_rig_devices → flatten_rig_devices rename in acquisition.py
- Add semantic_check=False to key_source join in streams_maker
- Fix remaining DJ 0.14 config patterns in test fixtures
Some integration test failures remain due to function signature mismatches
between Chang Huan's load_metadata refactors and our test fixtures — to be
addressed in follow-up.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve remaining DJ 2.x compatibility issues after merge
- Add semantic_check=False to aggr join in load_metadata.py
(previous_epoch lineage mismatch)
- Fix get_device_mapper_from_rig() call signature in tests
(metadata_filepath arg removed in refactor)
- Revert key_source to * join (lineage check skipped when
~lineage table absent in testcontainers)
All 55 integration tests pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use join(semantic_check=False) instead of * for key_source
The * operator triggers DJ 2.x lineage checks which fail when Chunk and
ExperimentDevice have epoch_start with different lineages. Using
join(semantic_check=False) is production-safe — works with lineage
tables present (persistent DB), not just in testcontainers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix ruff linting issues in changed files
- Fix import sorting (I001)
- Fix line length (E501)
- Add docstrings to codec methods (D102)
- Rename unused loop vars (B007)
- Add D101, S110, SIM105 to test ruff ignores
- Shorten table definition comment
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: rename _flatten_rig_devices to flatten_rig_devices in unit tests
Matches the public rename done in load_metadata.py refactor.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): skip tests requiring private aeon_exp package in CI
- Add pytest.importorskip for swc.aeon_exp in test_full_ingestion.py
(from lochhh's commit 8c4c5c9)
- Add skipif guard on TestPopulateCatalogFromPydantic
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review findings — DJ 2.x cleanup pass
- dj.schema() → dj.Schema() in ephys.py, spike_sorting.py,
spike_sorting_curation.py
- Remove populate/ folder (DJ 2.x Worker not supported, was removed
in original DJ 2.x upgrade but re-added via merge)
- Remove aeon_ingest script entry point (references removed populate/)
- Reset streams.py to catalog-only (78 lines)
- Fix dj.config['custom'] and dj.config['database.*'] bracket access
in ephys_v2_setup.py
- Update SPEC_TESTING.md code examples to DJ 2.x API
- Remove commented-out DJ 0.14 config in streams_maker.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: reset streams.py to catalog-only (test-regenerated again)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Handle non-finite values in column_stats
* Handle empty index case in timestamp_stats
Co-authored-by: Copilot <copilot@github.com>
* Move stats functions to separate module and update imports
Co-authored-by: Copilot <copilot@github.com>
* Skip checks for hardcoded password and import placement in tests
Co-authored-by: Copilot <copilot@github.com>
* Restore original streams.py in streams_schema teardown
* Update pyright version in pre-commit-config
* Move codec integration tests into test/dj_pipeline/utils
* Remove leftover smoke test
* Use module-level pytestmark for load_metadata tests
* Improve type hint in fetch_streams
* Use module-level pytestmark for full ingesti…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pass the pip install command as a list instead of a string in
install_package_in_container()github mode, so it works on both Docker and Singularity.Context
PR #4380 fixed the
#egg=deprecation by switching to Direct URL syntax, but the quoted string format only works on Docker. The two container backends split command strings differently:exec_run) usesshlex.split(), which is shell-aware and strips quotesspython) usesstr.split(" "), which splits naively and leaves quotes attachedSo a command like
pip install "pkg @ url"gets parsed correctly by Docker but breaks on Singularity. Passing the command as a list bypasses both splitting mechanisms, keeping thepkg @ urlrequirement as a single argument on both backends. This is already the pattern used inrunsorter.py.Tested against real Docker containers with pip 26.0+ simulating both code paths, and confirmed working in a real Singularity workflow.
Fixes #4368