Skip to content

fix: array creation in tests#169

Draft
flying-sheep wants to merge 2 commits into
mainfrom
fix-array-create
Draft

fix: array creation in tests#169
flying-sheep wants to merge 2 commits into
mainfrom
fix-array-create

Conversation

@flying-sheep
Copy link
Copy Markdown
Collaborator

@flying-sheep flying-sheep commented May 19, 2026

so what I thought happens is that they split up codecs into

  1. shards: if present, this contains the parameters for an implicitly created ArrayToArrayCodec that wraps all other given codecs and applies them for each shard individually
  2. filters: list of ArrayToArrayCodecs
  3. serializer: a single ArrayToBytesCodec
  4. compressors: a list of BytesToBytesCodecs

this assumption seems to hold for e.g. tests/test_codecs.py::test_order, but other tests fail, so apparently I didn’t understand this.

Comment thread tests/test_sharding.py
Comment on lines +236 to +243
shards=ShardsConfigParam(
shape=(32, 32, 32), index_location=outer_index_location
),
dtype=data.dtype,
fill_value=0,
codecs=[
ShardingCodec(
chunk_shape=(32, 32, 32),
codecs=[
ShardingCodec(
chunk_shape=(16, 16, 16), index_location=inner_index_location
)
],
index_location=outer_index_location,
)
],
serializer=ShardingCodec(
chunk_shape=(16, 16, 16), index_location=inner_index_location
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having both of these seems wrong?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, it was the following, no idea how to express that the new way.

codecs=[
    ShardingCodec(
        chunk_shape=(32, 32, 32),
        codecs=[
            ShardingCodec(
                chunk_shape=(16, 16, 16), index_location=inner_index_location
            )
        ],
        index_location=outer_index_location,
    )
],

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this one specificully is for nested sharding. I think this one is specifically testing nested sharding but I dont know if the way you descibed it is valid, so cc @d-v-b again!

Comment thread tests/test_sharding.py
shape=data.shape,
chunk_shape=data.shape,
chunks=data.shape,
shards=ShardsConfigParam(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't even know what this is, I see it's in the type, but it's undocumented:

Image

that shape parameter could mean a two things, as an example of why I don't know how it works: chunks within the shard arrangement or the overall shape

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @d-v-b

Copy link
Copy Markdown
Collaborator Author

@flying-sheep flying-sheep May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the param docstring for shards needs work:

  • the rST type contains only tuple[int, ...]
  • the free text contains "auto" and "keep" as well
  • the type contains Sequence[Sequence[int]] and that TypedDict allowing to specify index_location

this drift between type annotation and docstring is why I make sure to use the annotations in scanpy …

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree this is clunky. we goofed by having 2 flavors of array creation routines (create and create_array) X two function colors, each wrapped by multiple higher-level functions.

The idea behind the ShardsConfigParam is to give you some control of the sharding codec configuration via the shards parameter. If you don't need to specify where the index is, then ignore it and just pass the outer chunk shape you want to the shards param.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so it’s just a docs issue and you should

  1. document the exported location of ShardsConfigParam
  2. unify the 2 separate locations in the docstring where you explain what’s accepted by reflecting what the type annotation already says.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we should also probably make the ShardsConfigParam model the full configuration space of the sharding_indexed codec (you can't set the index codecs on it, iirc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants