Skip to content

Correctness bug bundle: distance, posterior, transitions, sim, viz, HPD (+ position-dim refactor)#29

Merged
edeno merged 22 commits into
mainfrom
correctness-bug-bundle
Jun 11, 2026
Merged

Correctness bug bundle: distance, posterior, transitions, sim, viz, HPD (+ position-dim refactor)#29
edeno merged 22 commits into
mainfrom
correctness-bug-bundle

Conversation

@edeno

@edeno edeno commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

A bundle of correctness fixes plus small follow-ups surfaced by a full-repo review and three filed issues (#40#42). Changes are organized into focused, independently-revertible commits. No public API removals; model snapshot and golden-regression outputs are unchanged.

Closes #40
Closes #41
Closes #42

Fixed

Review findings

  • analysis.distance1D.get_map_speed: trailing boundary speed was inserted before the last sample (np.insert(arr, -1, val)), misordering the final two speeds for any chunk with ≥3 time bins. Now appended.
  • _DetectorBase.estimate_parameters: post-EM cleanup of _encoding_model_data never ran (hasattr attribute-name mismatch); fitted models retained spike-time/waveform arrays.
  • ContFrag{SortedSpikes,Clusterless}Classifier.get_posterior: raised on 2D environments (summed a position dim that only exists in 1D). Now collapses every position/*_position dim, guards against an empty dim list, and the two identical bodies are de-duplicated into one helper. state output dim preserved.
  • Environment.fit_place_grid: a tuple place_bin_size with a track_graph crashed deep inside np.linspace; now raises a clear ValidationError with an example.
  • RandomWalk direction-aware transitions: crashed with AttributeError in networkx when no N-D track graph was available; now raises ConfigurationError.
  • figurl_2D.process_decoded_data: the uint16 grid guard had an off-by-one — it rejected valid 256×256 grids (65536 bins, max index 65535 fits). Fixed to key on the largest index (overflow begins at 65537 bins) and extracted into a sortingview-free, unit-testable validate_grid_within_uint16 helper.
  • make_simulated_run_data: hardcoded mark_spacing=10 while replay generators default to MARK_SPACING=5; unified on the module constant.
  • get_HPD_spatial_coverage: only handled 1D; now integrates over the product of per-axis bin widths, supporting 1D (length), 2D (area), 3D (volume), and beyond. 1D/2D results numerically identical to before.

Filed issues

Latent

  • base._convert_seq_to_df position naming capped labels at 4 (x/y/z/w) and silently dropped position columns for 5-6D environments. Convention centralized in non_local_detector._position_dims and shared by all consumers; truncation fixed.

Comprehensive-review follow-ups (silent → loud)

  • distance2D.get_map_estimate_direction_from_track_graph (precomputed_distance=True): on a disconnected track graph an unreachable head/MAP pair kept the -1 path sentinel and silently indexed node_positions[-1], emitting a wrong heading. Unreachable pairs now warn and return NaN (mirrors get_2D_distance).
  • RandomWalk.make_state_transition: a direction= argument was silently ignored unless use_manifold_distance=True on an N-D grid (the default use_manifold_distance=False, or a track_graph, dropped it). Now raises ConfigurationError up front rather than returning an unconstrained walk.

Added

  • get_HPD_spatial_coverage gained an optional bin_width parameter for non-uniform 1D (linearized track-graph) grids. Multi-segment tracks with unequal arms produce on-track bins of different widths (plus edge_spacing gap bins); the uniform default used the first bin's width for every bin, biasing coverage. Pass per-bin widths — e.g. np.diff(environment.edges_[0]) — for the exact integral. Default unchanged (uniform), restricted to a single position dim since multidimensional grids are uniformly spaced. (Determined by tracing every grid path: open-field/single-segment tracks are uniform → center-derived widths equal the true edges; only unequal-arm track graphs are non-uniform, and there the widths must come from the real edges.)

Changed

  • head_direction_simliarity / get_ahead_behind_distance2D: head_direction docstring says (n_time, 2) but code requires a 1-D angle array #41 head_direction_simliarity / get_ahead_behind_distance2D: corrected the head_direction docstring from (n_time, 2) to a 1-D heading-angle array (n_time,). No code change.
  • posterior_consistency_hpd_overlap: Notes block clarifies the asymmetric containment coefficient |A∩B| / min(|A|,|B|). No code change.
  • posterior_consistency_{kl_divergence,hpd_overlap}: corrected the input-shape docstrings (both require flattened (n_time, n_position_bins), not the previously-documented (n_time, n_x_bins, n_y_bins) which raises) and added the missing factor of 2 to the Sørensen-Dice formula. Docstrings only.

Removed

  • environment.order_boundary and get_track_boundary_points: relied on nx.from_scipy_sparse_matrix (removed in NetworkX 3.0) and were unused; the unused NearestNeighbors import was removed too.

Refactor / internal

  • New non_local_detector._position_dims centralizes the position-dim naming convention (get_position_dim_names / get_position_dims), shared by both base.py results constructors, ContFrag*.get_posterior, and get_HPD_spatial_coverage (previously re-derived with three different rules). Direct unit tests pin the 5-6D and >6D branches.
  • Shared synthetic-results test factory (conftest.make_state_bins_results) replaces near-duplicate builders across two test modules.

Test plan

🤖 Generated with Claude Code

edeno and others added 2 commits May 21, 2026 05:23
…erior 2D

Three independently-revertible bug fixes plus their behavioral tests.

- analysis.distance1D.get_map_speed: trailing boundary speed was
  inserted before the last array element instead of appended; for
  every chunk with three or more time bins, the final two speed
  samples were misordered. Switch np.insert(arr, -1, val) to
  np.append(arr, val).

- models.base._DetectorBase.estimate_parameters: post-EM cleanup of
  the encoding-model data did nothing because the hasattr check used
  trailing-underscore form while write sites use leading-underscore.
  Spike-time and waveform-feature arrays were retained for the
  lifetime of every fitted model.

- models.cont_frag_model.ContFrag{SortedSpikes,Clusterless}Classifier
  .get_posterior: raised for 2D environments because it summed over
  the dim name "position" which only exists for 1D. Collapse every
  dim whose name matches "position" or ends with "_position",
  preserving the @staticmethod decorator and the singular "state"
  output dim name.

Tests:
- Parameterized boundary-handling tests for get_map_speed across
  n_time in {3, 4, 5, 100}, plus a regression test that distinguishes
  the np.append output from the buggy np.insert(-1) output.
- Assertion that _encoding_model_data is no longer present after
  estimate_parameters completes.
- 2D and 1D get_posterior tests for both ContFrag classifiers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven small correctness fixes plus tests; one docstring clarification;
one CHANGELOG file.

- environment.Environment.fit_place_grid: tuple place_bin_size with a
  track_graph previously dereferenced a dead local and crashed inside
  np.linspace. Raises ValidationError with example.

- environment: remove order_boundary and get_track_boundary_points
  (both called the removed nx.from_scipy_sparse_matrix, both unused
  outside the module). Drop the now-unused NearestNeighbors import.

- continuous_state_transitions.RandomWalk._handle_no_track_graph:
  direction-aware mode on environments without an N-D track graph
  previously crashed inside networkx; raises ConfigurationError with
  a hint pointing to the two valid configurations.

- visualization.figurl_2D.process_decoded_data: linearized bin indices
  stored as uint16 silently wrapped for grids larger than 65535 cells
  (e.g., 256x256). Raise ValueError at the boundary; keep the uint16
  dtype because the downstream sortingview schema receives the field
  as uint16.

- simulate.clusterless_simulation: replace both hardcoded
  mark_spacing=10 occurrences with the module-level MARK_SPACING
  constant. Encoding marks previously landed at {0,10,20,30} while
  replay generators used {0,5,10,15}, silently invalidating any test
  mixing the two.

- model_checking.highest_posterior_density.get_HPD_spatial_coverage:
  docstring advertised 2D support but the implementation only used a
  1D bin width. Branch on dim names and use the 2D cell area
  (bin_width_x * bin_width_y) when x_position and y_position are
  present. Signature preserved.

- model_checking.posterior_consistency.posterior_consistency_hpd_overlap:
  docstring expanded to flag the asymmetric containment-coefficient
  semantics (|A intersect B| / min(|A|,|B|)) and point readers at
  Sorensen-Dice / Jaccard for symmetric overlap. No code change.

Tests cover each fix in isolation. The figurl overflow tests use
pytest.importorskip("sortingview"); the mark-spacing test asserts
exact replay-side spacing and bounds encoding-side spread below the
pre-fix value (encoding marks have additive Gaussian noise, so the
literal grid is not directly observable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
edeno and others added 6 commits June 10, 2026 09:12
The overflow guard rejected n_position_bins > 65535, but the values stored
as uint16 are bin *indices* (max index = n_position_bins - 1), so a 256x256
grid (65536 bins, max index 65535) fits in uint16 and was wrongly rejected.
Compare the largest index instead; true overflow begins at 65537 bins.

Extract the check into a module-level validate_grid_within_uint16 helper so
it is importable and unit-testable without sortingview, and rewrite the guard
test to pin the true pass/raise boundary (65536 passes, 65537 raises).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ContFrag{SortedSpikes,Clusterless}Classifier.get_posterior:
- Guard against an empty position-dim list (unstacked.sum([]) would silently
  return the full per-bin array instead of (time, state)).
- Extract the byte-identical body of both classifiers into a shared module
  helper _state_probabilities_from_results.

RandomWalk: reword the direction-aware ConfigurationError hint, which had
referenced a non-existent Environment(use_manifold_distance=...) argument
(it is a RandomWalk parameter).

Tests: add nD (dim{i}_position) and empty-dim coverage and consolidate the
1D/2D/nD get_posterior tests into one parametrized test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- HPD 2D coverage: add a multi-bin test exercising the
  sum([x_position, y_position]) reduction; the existing test put all mass on a
  single bin and could not distinguish a multi-axis sum from a single-axis one.
- Simulator MARK_SPACING: derive the regression boundary from
  PLACE_FIELD_MEANS / N_TETRODES / MARK_SPACING instead of hardcoded 4/10/30.
- _encoding_model_data cleanup: add a fast, isolated unit test so the cleanup
  contract is checked even when slow/integration tests are deselected, and
  drop the assertion that was riding along on the slow EM-consistency test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nel)

- #40 distance2D.get_map_estimate_direction_from_track_graph: the all-pairs
  nx.shortest_path is a generator under networkx>=3 (not a dict), so .items()
  raised AttributeError. Iterate it directly (also avoids materializing the
  full O(n_nodes^2) dict).
- #41 head_direction_simliarity / get_ahead_behind_distance2D: correct the
  head_direction docstring from (n_time, 2) to a 1-D heading-angle array
  (n_time,), matching the implementation and the 1-D sibling.
- #42 make_single_environment_movie: the multiunit panel went blank after the
  first parallel chunk because the bottom-axis limits were gated on
  frame_idx == 0; only the worker owning frame 0 set them. Initialize on the
  first frame each worker renders via an _ax1_init flag (mirrors the mesh
  pattern). Adds a regression test.

Also adds distance2D track-graph regression tests and strengthens the
distance1D get_map_speed boundary test (asymmetric final edge so the pre-fix
np.insert(-1) swap is observable at every n_time, replacing a degenerate
uniform-edge case), and updates the CHANGELOG for the above plus the figurl
uint16 wording.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… N-D

The canonical position-dim names (position; x/y/z/w/v/u_position for 2-6D;
dim{i}_position beyond six) were re-derived independently in base.py (two
sites), cont_frag_model.py, and highest_posterior_density.py. One base.py
site (_convert_seq_to_df) capped its labels at four (x/y/z/w) and silently
dropped the extra position columns for 5- and 6-D environments.

- Add non_local_detector._position_dims with get_position_dim_names (build the
  names) and get_position_dims (detect them on a DataArray); depends only on
  xarray so both models and model_checking can import it without a cycle.
- Route both base.py constructors, ContFrag*.get_posterior, and
  get_HPD_spatial_coverage through it. Fixes the 5-6D truncation.
- get_HPD_spatial_coverage now integrates over the product of per-axis bin
  widths, supporting 1D (length), 2D (area), 3D (volume) and beyond, instead
  of only 1D/2D. 1D/2D results are numerically identical to before
  (verified max|diff| = 0); adds a 3D regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The state_bins-MultiIndex results Dataset was built by near-identical helpers
in two test files (_make_results_dataset_1d/2d and _make_cont_frag_results).
Extract the shared construction into conftest.make_state_bins_results (keyed on
per-state position columns) and make both files' builders thin wrappers over it
— the paired-column variant directly, the ContFrag variant via a meshgrid. Same
seeds and layouts, so all existing assertions are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@edeno edeno changed the title Fix correctness bugs in distance, posterior, transitions, sim, viz Correctness bug bundle: distance, posterior, transitions, sim, viz, HPD (+ position-dim refactor) Jun 10, 2026
edeno and others added 14 commits June 10, 2026 09:59
The comment implied the sum is over named position dims (x_position/y_position),
but after .sel(state=...) the position bins remain stacked under a single
'state_bins' dim, so it sums that. Document the stacked layout and warn against
swapping in get_position_dims (which would return [] and break normalization).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- distance2D.get_map_estimate_direction_from_track_graph (precomputed_distance
  =True): on a disconnected track graph an unreachable head/MAP pair kept the
  -1 path sentinel and silently indexed node_positions[-1], emitting a wrong
  heading. Unreachable pairs now warn and return NaN, mirroring get_2D_distance.
- RandomWalk.make_state_transition: a direction= argument was silently ignored
  unless use_manifold_distance=True on an N-D grid (the default
  use_manifold_distance=False, or a track_graph, dropped it). Now raises
  ConfigurationError up front instead of returning an unconstrained walk.

Both have regression tests verified RED against the prior code; the connected-
graph and direction=None (all model-default) paths are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both posterior_consistency_kl_divergence and posterior_consistency_hpd_overlap
documented an (n_time, n_x_bins, n_y_bins) input alternative that the
implementations do not support (they operate on axis=1 / axis=-1 and raise on a
true 3D array); corrected to flattened (n_time, n_position_bins). Also added the
missing factor of 2 to the Sørensen-Dice formula in the Notes block. Docstrings
only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pins get_position_dim_names at n=5,6,7 (the v/u_position and dim{i}_position
branches where the latent 5-6D column-truncation lived) and the
get_position_dims round-trip / non-position-dim exclusion, locking the
single-source-of-truth convention at the seam rather than only via consumers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rids

Linearized track-graph environments with unequal segment lengths produce
on-track bins of different widths (and edge_spacing gap bins), so the uniform
default — which uses the first bin's width for every bin — biases the coverage
integral. Add an optional bin_width parameter (per-bin widths, e.g.
np.diff(environment.edges_[0])) that integrates the exact bin measure; it is
restricted to a single position dimension since multidimensional grids are
uniformly spaced. The default path is unchanged and verified numerically
identical (max|diff|=0) for 1D/2D; non-uniform widths cannot be recovered from
bin centers, so passing real edges is the only exact route.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The all-pairs form of nx.shortest_path returns {source: {target: path}} as a
dict in networkx 3.0-3.4 but a generator of (source, paths) pairs in >=3.5.
The /simplify pass dropped the dict() wrapper to iterate directly, which is
only valid for the generator form; under the older networkx on the Python 3.10
CI runner it iterated the dict's keys and raised 'cannot unpack non-iterable
int object'. dict(...) normalizes both (the package pins networkx>=3.0).

Verified the three track-graph tests pass against both networkx 3.2.1 (dict)
and 3.6.1 (generator).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the unconditional dict(...) materialization with an isinstance branch:
the all-pairs nx.shortest_path is a dict in networkx 3.0-3.4 (use .items()) and
a generator in >=3.5 (iterate lazily). Iterating the generator one source at a
time avoids holding all O(n_nodes^2) shortest paths simultaneously, which
matters for large track graphs, while staying correct on the older dict form.

Verified the track-graph tests pass against both networkx 3.2.1 (dict) and
3.6.1 (generator).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
analysis.distance2D.get_map_estimate_direction_from_track_graph with
precomputed_distance=False: a disconnected head/MAP pair made
nx.shortest_path(source, target) raise nx.NetworkXNoPath, which was
uncaught and crashed. The sibling precomputed_distance=True branch (and
get_2D_distance) already warn and return NaN for unreachable pairs; the
non-precomputed branch now does the same, keeping the existing
source==target IndexError catch.

Tests:
- test_nonprecomputed_direction_disconnected_graph_is_nan (RED against
  the prior code, which raised NetworkXNoPath).
- test_precomputed_distance_dict_branch_matches_generator: monkeypatch
  the all-pairs nx.shortest_path to its networkx<3.5 dict shape and
  assert the .items() fallback matches the >=3.5 generator path, so that
  branch is covered regardless of the installed networkx version.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
make_simulated_run_data again passes mark_spacing=10 at both call sites
(matching origin/main), so this branch no longer alters simulator output.
The earlier change unified encoding on the module MARK_SPACING (=5), which
silently halved the encoding mark-center spacing -- a behavioral change
that does not belong in an output-preserving correctness bundle.

Also drops the now-inapplicable module-docstring note, the CHANGELOG
entry, and test_mark_spacing_consistent_between_encoding_and_replay,
whose premise (encoding and replay share a spacing) no longer holds. The
encoding/replay mark-spacing mismatch is a pre-existing latent issue left
for a separate, deliberate change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
highest_posterior_density.get_HPD_spatial_coverage:
- Raise a descriptive ValueError for a length-1 position axis (was a bare
  numpy IndexError from np.diff(...)[0]) and for non-monotonic/descending
  position coordinates (previously produced a silent negative/wrong
  spatial measure).
- bin_width branch: reduce over the position axis by name rather than the
  positional axis=-1, so the result no longer silently assumes a
  time-leading layout (the uniform branch already reduced by name).
- Docstring: "total spatial area" -> "total spatial measure"
  (length/area/volume) and document the strictly-increasing requirement.

posterior_consistency.kl_divergence / hpd_overlap: add a shared
_validate_consistency_inputs guard that rejects non-2-D or shape-mismatched
input. The docstrings already require flattened (n_time, n_position_bins);
3-D input previously either reduced only the last axis (kl) or ran on
axis=1 (overlap), silently returning a wrong-shaped/incorrect result.

Tests cover the length-1, descending, and 1D bin_width shape-mismatch
ValueErrors and the 2-D/shape enforcement (all RED against the prior
code). Valid 1D/2D/3D coverage results are numerically unchanged (golden,
snapshot, and the exact nonuniform-bin_width regression all pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e update

visualization.movie._render_single_env_frame: the per-worker ax1 limit
init ran after multiunit_line.set_data, whose rate-window indexing can
raise IndexError near a recording boundary (swallowed by the surrounding
except). A worker whose opening frames all land in that edge region would
skip the init and keep default ax1 limits, re-introducing the off-screen
firing-rate bug for boundary chunks. Move the init ahead of set_data (the
limits depend only on window_ind and rate.max(), not the frame), and log
the swallowed IndexError at debug level instead of passing silently.

Tests: a boundary chunk whose first frame overruns the rate array still
sets the limits (RED against the prior ordering), and the limits are set
once per worker (not reset on later frames).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cabulary

_position_dims.get_position_dim_names now raises ValueError for
n_position_dims < 1 instead of silently returning [] (which would build a
degenerate empty MultiIndex level set downstream). get_position_dims now
matches only the closed vocabulary it emits -- "position", the labelled
x/y/z/w/v/u_position names, and the numbered dim{i}_position fallback --
so a stray *_position dim (e.g. head_position, map_position,
linear_position) is no longer mistaken for a decoder position axis and
wrongly marginalized.

Adds direct tests for the non-positive-dimension rejection and the
underscore-suffix false-positive exclusion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_random_walk_inward_no_trackgraphDD_raises matched
"direction-aware RandomWalk", a phrase present in both the up-front
make_state_transition guard and the deeper _handle_no_track_graph guard,
so it did not pin which one fired. Match "requires an N-D track graph",
unique to the deep guard this test exercises.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ut case

TestPosteriorConsistencyHPDOverlap::test_rejects_3d_input used shape
(2, 3, 3), where n_time != n_x made the pre-guard threshold broadcast
*incidentally* raise -- so the test was RED for the wrong reason (a
cryptic broadcast error), not because it exercised the genuinely silent
path. Use (3, 3, 3) (n_time == n_x): the pre-guard code's
`posterior >= threshold[:, None]` then broadcasts successfully and the
function silently returns a wrong-shaped (3, 3) array, which is the
silent failure _validate_consistency_inputs prevents. Verified RED
against the pre-guard source (DID NOT RAISE) and green after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@edeno edeno merged commit e79a7b6 into main Jun 11, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment