Skip to content

processors controller docs added to the repo.#209

Open
predragmacura wants to merge 1 commit into
developfrom
processor-controller-docs
Open

processors controller docs added to the repo.#209
predragmacura wants to merge 1 commit into
developfrom
processor-controller-docs

Conversation

@predragmacura

Copy link
Copy Markdown

Hey guys, the current user-docs structured is expended with the processor controller docs. You can find all new documents in the transformations section at the end of the advanced guide.

Please go through the documents carefully and verify that what's stated matches the actual codebase and current product behavior.

Please note that this is just a fraction of the documentation restructure project I'm currently working on. This means that each document represents one of fours possible Diataxis types (tutorial, how-to, reference, explanation).

@@ -0,0 +1,38 @@
# About the Processors Controller

A transformation takes an attached input file and turns its contents into ODM objects. Some transformations produce indexable metadata: for example, ODM cannot index a CSV file directly as a source of metadata, but the `metadata-basic` transformation converts it into TSV-based metadata objects that ODM can index. Others turn raw data into structured ODM objects: for example, the `hdf5-cells` transformation converts single-cell HDF5 (H5AD/H5) files into ODM Cell Groups and Expression Groups. Either way, a transformation bridges the gap between the file you have and the ODM objects you need.

@MariaBorodaenko MariaBorodaenko Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the title is "About The Processors Controller", it makes sense to start with a sentence about it, not transformations, otherwise users can be confused - they don't know yet the relationship between them.

Also, I would suggest to highlight why it is needed and what problem it solves - no need to wrangle the files before ingesting to ODM.

The last thing: HDF5/H5 files are not necessarily contains raw data. H5 files are usually raw at first, but later processed and enriched with clusters, UMAP and/or PCA values, cell-types and other cell-metadata values. Also, the expression values can be normalised, features properly annotated. Sometimes an HDF5 file can contain both original raw data and annotated/processed files.

We recommend our users to transform and indexed only the processed single-cell data from the HDF5 file which is the most meaningful for further exploration and analysis.

Thus I would suggest to state here that we can transform both metadata (example with samples) and data (example with hdf5-cells).

@@ -0,0 +1,38 @@
# About the Processors Controller

A transformation takes an attached input file and turns its contents into ODM objects. Some transformations produce indexable metadata: for example, ODM cannot index a CSV file directly as a source of metadata, but the `metadata-basic` transformation converts it into TSV-based metadata objects that ODM can index. Others turn raw data into structured ODM objects: for example, the `hdf5-cells` transformation converts single-cell HDF5 (H5AD/H5) files into ODM Cell Groups and Expression Groups. Either way, a transformation bridges the gap between the file you have and the ODM objects you need.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A transformation takes an attached input file and turns its contents into ODM objects. Some transformations produce indexable metadata: for example, ODM cannot index a CSV file directly as a source of metadata, but the `metadata-basic` transformation converts it into TSV-based metadata objects that ODM can index. Others turn raw data into structured ODM objects: for example, the `hdf5-cells` transformation converts single-cell HDF5 (H5AD/H5) files into ODM Cell Groups and Expression Groups. Either way, a transformation bridges the gap between the file you have and the ODM objects you need.
The Processors Controller is the ODM API for running file transformations: it lets you discover what transformation images are available, configure how they process your data, and execute and monitor transformation jobs. Its purpose is to remove preprocessing as a barrier to ingestion - you attach the file you have and ODM transforms it into queryable objects. Transformations cover both metadata and data: `metadata-basic` converts a CSV into indexable sample objects, while `hdf5-cells` extracts cell-type annotations, dimensional-reduction results, and processed expression values from an H5AD or H5 file into ODM Cell Groups and Expression Groups.


These three map onto a simple idea: an image is *what processing to do*, a configuration is *how to tune it*, and a job is *doing it once, against specific files*. The same image and configuration can drive many independent jobs.

## Transformation configurations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to swap configurations and images. Starting with the images makes more sense since the transformation itself is defined in the image. Configurations are an additional handle which can optionally be used to "tune" the transformation to your case (only in scope of what is already written within the code.) Configurations are optional.

We could also state clearly that users cannot change the transformation image itself, but can adjust it within the supported scope via configurations.


## Transformation images

Transformation images are versioned container images (self-contained, ready-to-run packages of the processing software) that run the processing logic. Available image versions can be queried through the API. Each image handles a specific input/output format pair: for example, `metadata-basic` converts CSV files to TSV-based metadata objects (samples, libraries, preparations, cell metadata, expression, or variants); `hdf5-cells` converts H5AD or H5 single-cell files into ODM Cell Groups, Expression Groups, and associated metadata. When starting a job you can specify either `latest` or a specific release tag.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know only transforming CSV files to samples is supported at the moment. Let's remove other options.


A job is not where your results are stored. As it runs, the transformation writes its output into ODM as ordinary objects, so once the job finishes those results are part of your ODM data like anything else.

When a job finishes it stops running, and the resources that were processing it are released: nothing keeps running in the background. What stays behind is the job's record: its final status and full logs. These are kept indefinitely, with no expiry, and are never deleted. You retrieve them through the same API endpoints whether the job is still running or finished long ago, so from your side nothing about fetching a job's status or logs changes once it is done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the true for the transformation job itself, the container created for it. But it is not so for ODM itself - the job is considered done once the import data job is finished. But right after it the internal indexing in ODM is started. For some large single-cell data files it could take hours and significantly load ODM resources. I would suggest to rephrase this part, since it creates an impression that user can execute multiple jobs one by one, which can in reality be problematic.


A job is not where your results are stored. As it runs, the transformation writes its output into ODM as ordinary objects, so once the job finishes those results are part of your ODM data like anything else.

When a job finishes it stops running, and the resources that were processing it are released: nothing keeps running in the background. What stays behind is the job's record: its final status and full logs. These are kept indefinitely, with no expiry, and are never deleted. You retrieve them through the same API endpoints whether the job is still running or finished long ago, so from your side nothing about fetching a job's status or logs changes once it is done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a job finishes it stops running, and the resources that were processing it are released: nothing keeps running in the background. What stays behind is the job's record: its final status and full logs. These are kept indefinitely, with no expiry, and are never deleted. You retrieve them through the same API endpoints whether the job is still running or finished long ago, so from your side nothing about fetching a job's status or logs changes once it is done.
When a job finishes, the transformation is complete and its container is released. Be aware that ODM continues internal indexing after the job completes; the imported data may not be immediately available via the API until indexing is done. The job record with its final status and logs is retained in line with ODM's standard log retention policy and is retrievable through the same API endpoints, whether the job is still running or long finished.

Transformations can be run in dry-run mode by setting `dry_run: true`. A dry run validates the configuration and input without writing any objects to ODM, which makes it the safest way to iterate on a configuration before committing results. For the steps to run a job in dry-run mode, see [How to run a transformation](how-to-run-a-transformation.md).

!!! warning "Editorial TODO: resolve before publishing"
Confirm which transformation images actually implement dry-run (validate-without-write). The flag is accepted for all jobs, but honoring it is image-specific. If not universal, scope this wording (e.g. to single-cell).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry run is currently implemented for hdf5-cells only.


## Transformation logs

Each job produces a log recording processing steps, warnings, errors, the source file name and accession, and the accessions of any ODM objects it created. Logs are retained permanently and are always retrievable through the API: the logs endpoint returns the live log while the job runs and the archived log once it has finished, transparently. A finished job's log is never unavailable. Separately, the log is also uploaded into ODM as an attachment on the owning study, so it sits alongside the job's other generated files; this attachment is an additional copy and is not what keeps the log available.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to avoid too strict wording like "permanently", "always", never unavailable".

Regarding the copy loaded as attachment, we hope to remove this in the next week under the ticket https://genestack.atlassian.net/browse/BIA-151, I will keep you updated.


- An API token. See [Authentication and tokens](../getting-a-genestack-api-token.md).
- Curator group membership.
- Every study that contains an attachment listed in the job's `input_accessions` must be shared with you. Requests that reference an attachment you cannot access are rejected with a generic `Item not found or insufficient permission` message.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that this is to be changed after implementing https://genestack.atlassian.net/browse/ODM-13244 (case 1.4)

GET /api/v1/transformations/images
```

Note the `name` and `version` of the image you want to use. Use `"latest"` for the most recent version, or a specific release tag (for example, `"0.0.7"`) for reproducibility in production pipelines. See [Available images reference](available-images-reference.md) for the full catalogue and per-image guidance.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note the `name` and `version` of the image you want to use. Use `"latest"` for the most recent version, or a specific release tag (for example, `"0.0.7"`) for reproducibility in production pipelines. See [Available images reference](available-images-reference.md) for the full catalogue and per-image guidance.
Note the `name` and `version` of the image you want to use. The `version` field is optional - if omitted or set to `"latest"`, the most recent version is used automatically. Specify an explicit version tag (for example, `"0.0.7"`) for reproducibility in production pipelines. See [Available images reference](available-images-reference.md) for the full catalogue and per-image guidance.

- An API token. See [Authentication and tokens](../getting-a-genestack-api-token.md).
- Curator group membership.
- Every study that contains an attachment listed in the job's `input_accessions` must be shared with you. Requests that reference an attachment you cannot access are rejected with a generic `Item not found or insufficient permission` message.
- The source attachment already uploaded to a study in ODM (you need its accession). See [Import attached files](../import-data-in-odm.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The source attachment already uploaded to a study in ODM (you need its accession). See [Import attached files](../import-data-in-odm.md).
- The source attachment already uploaded to a study in ODM (you need its accession). See [Import attached files](../import-data-in-odm.md#attach-a-file).

Comment on lines +78 to +87
By default the job runs against the latest version of the configuration. To pin a specific version (for example, to reproduce an earlier job), add a `version` to the `configuration_reference` object:

```json
"configuration_reference": {
"id": <config_id>,
"version": <version>
}
```

> **[Subject to change (ODM-13233)]** The exact name and shape of the `configuration_reference` field are not yet finalized. Verify against the released API before relying on it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to move the part starting with "By default the job runs against the latest version of the configuration..." to the previous section: ## Step 2: Create or identify a configuration

Comment on lines +74 to +76
`volume_size` is a Kubernetes resource quantity string (for example `"30Gi"` for 30 GiB, or `"512Mi"`), not a plain number: the request is rejected if the value is not a valid quantity or is zero. As a guideline: for H5AD input files, allocate at least 1.4× the original file size; for 10x H5 input files, at least 4× the original file size; for CSV files, a small value such as `"30Gi"` is sufficient. If you omit `volume_size`, the image's default is used (falling back to `"30Gi"`).

The body also accepts an optional `memory_size` quantity string (for example `"512Mi"`). Increase it if a job ends in `FAILED` with `status.reason: OOMKilled`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`volume_size` is a Kubernetes resource quantity string (for example `"30Gi"` for 30 GiB, or `"512Mi"`), not a plain number: the request is rejected if the value is not a valid quantity or is zero. As a guideline: for H5AD input files, allocate at least 1.4× the original file size; for 10x H5 input files, at least 4× the original file size; for CSV files, a small value such as `"30Gi"` is sufficient. If you omit `volume_size`, the image's default is used (falling back to `"30Gi"`).
The body also accepts an optional `memory_size` quantity string (for example `"512Mi"`). Increase it if a job ends in `FAILED` with `status.reason: OOMKilled`.
Two optional parameters control resource allocation for the job: `volume_size` and `memory_size`.
The first, `volume_size`, sets the disk space allocated for processing. It must be a Kubernetes resource quantity string, for example, `"4Gi"` for 4 GiB or `"512Mi"` for 512 MiB. The request is rejected if the value is not a valid quantity or is zero. As a guideline: for H5AD input files, allocate at least 1.4× the original file size; for 10x H5 input files, at least 4×; for CSV files, a small value is typically sufficient.
`memory_size` sets the RAM allocated for processing, using the same quantity format, for example, `"512Mi"`. Increase it if a job ends in `FAILED` with `status.reason: OOMKilled` (out-of-memory termination).
For default values for both parameters, see [Available images reference](/transformations/available-images-reference.md).


- Configuration validation messages.
- The file structure report: which metadata keys are present in your input file.
- Linking validation results: whether cell batch values resolve to existing ODM objects.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Linking validation results: whether cell batch values resolve to existing ODM objects.
- Linking validation results: whether the transformation output can be linked to existing ODM objects.


## Step 6: Submit the full run

Once the dry run completes without issues, resubmit with `dry_run` set to `false`:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once the dry run completes without issues, resubmit with `dry_run` set to `false`:
Once the dry run completes without issues, resubmit the job. You can either set `dry_run` to `false` or omit it entirely — it defaults to `false`:

}
```

Monitor and review logs the same way as Steps 4–5. When the job completes, the logs contain the ODM accessions of all objects that were created or updated. Logs are uploaded as an attachment to the same study.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Monitor and review logs the same way as Steps 4–5. When the job completes, the logs contain the ODM accessions of all objects that were created or updated. Logs are uploaded as an attachment to the same study.
Monitor and review logs the same way as Steps 4–5. When the job completes, the logs contain the ODM accessions of all objects that were created or updated.

This again should be changed under https://genestack.atlassian.net/browse/BIA-151


## Use-case guides

- For single-cell HDF5 ingestion, see [single-cell/single-cell-getting-started.md](single-cell/single-cell-getting-started.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For single-cell HDF5 ingestion, see [single-cell/single-cell-getting-started.md](single-cell/single-cell-getting-started.md).
- For single-cell HDF5 ingestion, see [Single-cell data in ODM: Getting started](single-cell/single-cell-getting-started.md).

## Use-case guides

- For single-cell HDF5 ingestion, see [single-cell/single-cell-getting-started.md](single-cell/single-cell-getting-started.md).
- For CSV-to-TSV conversion, see [csv-to-tsv/how-to-transform-csv-to-tsv.md](csv-to-tsv/how-to-transform-csv-to-tsv.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For CSV-to-TSV conversion, see [csv-to-tsv/how-to-transform-csv-to-tsv.md](csv-to-tsv/how-to-transform-csv-to-tsv.md).
- For CSV-to-TSV conversion, see [CSV to Sample Group](csv-to-tsv/how-to-transform-csv-to-tsv.md).

@@ -0,0 +1,104 @@
# How to manage transformation configurations

This guide shows you how to develop a transformation configuration and iterate on it until it produces the results you want. A configuration is a reusable, versioned JSON document that tells an image how to process your input. The workflow below takes you from a first draft, through dry-run testing, to a configuration you can run for real and reuse across jobs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This guide shows you how to develop a transformation configuration and iterate on it until it produces the results you want. A configuration is a reusable, versioned JSON document that tells an image how to process your input. The workflow below takes you from a first draft, through dry-run testing, to a configuration you can run for real and reuse across jobs.
This guide shows you how to develop a transformation configuration and iterate on it until it produces the results you want. A configuration is a reusable, versioned JSON document that tells an image how to process your input. The workflow below takes you from a first draft, through dry-run testing, to a validated configuration you can run against your data and reuse across jobs.


## The iteration loop

Developing a configuration is a loop. You create a first draft, submit it as a dry-run job, review the logs, and update the configuration to fix whatever the dry run surfaced, repeating until the dry run is clean. Only then do you submit a full run.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Developing a configuration is a loop. You create a first draft, submit it as a dry-run job, review the logs, and update the configuration to fix whatever the dry run surfaced, repeating until the dry run is clean. Only then do you submit a full run.
Developing a configuration is a loop. You create a first draft, submit it as a dry-run job, review the logs, and update the configuration based on the results, repeating until the dry run completes without issues and produces the output you expect. Only then you submit a full run. Once the configuration is working, you can reuse it for any input file with the same structure.


The request body requires `data`: the image-specific processing specification. `name` and `description` are optional but recommended, since the list and get responses surface them so you can identify the configuration later.

For the `metadata-basic` image (CSV to Sample group), a minimal request looks like this:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a configuration for metadata-basic at all? Can they affect anything in the transformation image flow?

}
```

Keep that `id`: you use it to retrieve, update, and submit jobs against the configuration. For the single-cell HDF5 `hdf5-cells` image, the `data` field follows a different schema. See the [Configuration Reference](single-cell/configuration-reference.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Keep that `id`: you use it to retrieve, update, and submit jobs against the configuration. For the single-cell HDF5 `hdf5-cells` image, the `data` field follows a different schema. See the [Configuration Reference](single-cell/configuration-reference.md).
Keep that `id`: you use it to retrieve, update, and reference the configuration in job submissions. For the single-cell HDF5 `hdf5-cells` image, the `data` field follows a different schema. See the [Configuration Reference](single-cell/configuration-reference.md).


## Submit a dry run and review the logs

Submit the configuration as a dry-run job, then read the job logs to see how it behaved against your real input without writing any data. The job-submission and log-retrieval endpoints are covered in [How to run a transformation](how-to-run-a-transformation.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Submit the configuration as a dry-run job, then read the job logs to see how it behaved against your real input without writing any data. The job-submission and log-retrieval endpoints are covered in [How to run a transformation](how-to-run-a-transformation.md).
Submit a dry-run job referencing this configuration against your input file, then review the logs to verify it behaves as expected, without writing any data to ODM. The job-submission and log-retrieval endpoints are covered in [How to run a transformation](how-to-run-a-transformation.md).

PUT /api/v1/transformations/configurations/{id}
```

The request body follows the same structure as the `POST` endpoint. Updating does not overwrite the configuration: the current state is archived as a previous version and the active version is incremented. The same `id` is reused across all iterations, and every earlier version stays retrievable, so you can audit or re-run a job with the exact parameters used in the past.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The request body follows the same structure as the `POST` endpoint. Updating does not overwrite the configuration: the current state is archived as a previous version and the active version is incremented. The same `id` is reused across all iterations, and every earlier version stays retrievable, so you can audit or re-run a job with the exact parameters used in the past.
The request body follows the same structure as the `POST` endpoint. Updating does not overwrite the configuration: the current state is saved as a previous version and the active version is incremented. The same `id` is reused across all iterations, and any version can be referenced in a job - by default the latest is used.


The request body follows the same structure as the `POST` endpoint. Updating does not overwrite the configuration: the current state is archived as a previous version and the active version is incremented. The same `id` is reused across all iterations, and every earlier version stays retrievable, so you can audit or re-run a job with the exact parameters used in the past.

Resubmit the dry-run job against the same configuration and review the logs again. Repeat until the dry run completes without errors or warnings that require action, then submit the full run.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Resubmit the dry-run job against the same configuration and review the logs again. Repeat until the dry run completes without errors or warnings that require action, then submit the full run.
Resubmit the dry-run job awith the updated configuration and review the logs again. Repeat until the dry run completes without issues and produces the output you expect, then submit the full run.


## Review your configurations

At any point you can inspect what you have. To list your configurations:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At any point you can inspect what you have. To list your configurations:
At any point you can inspect all available configurations. To list them:

GET /api/v1/transformations/configurations
```

The response is a paginated envelope: the configurations are in the `items` array, and `limit`/`offset` query parameters page through the results (default 100 per page). The list returns the latest version of each configuration, including its full `data`, so you can review the current state of each one without a second request. See [Pagination](api-reference.md#pagination).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part will be updated after https://genestack.atlassian.net/browse/ODM-13238 is tested.

A new parameter "include_archived" will be added, false by default. Only active configs will be retrieved by default, archived configs can be included intentionally by setting "include_archived" to "true".


## Reuse a working configuration

Configurations are reusable. Once a configuration is working correctly, you can apply it to multiple input files in subsequent jobs without recreating it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Configurations are reusable. Once a configuration is working correctly, you can apply it to multiple input files in subsequent jobs without recreating it.
Once you have validated a configuration through dry-run testing, it becomes the foundation of your ingestion pipeline: the same configuration can be applied to any number of input files that share the same structure or come from the same source, without any further setup. This makes it straightforward to automate ingestion, for example, to process a batch of files or integrate transformation jobs into a recurring pipeline.

@@ -0,0 +1,70 @@
# Available transformation images reference

Each transformation image handles a specific input/output format pair. Use `GET /api/v1/transformations/images` to retrieve the current list of available images and their versions at runtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each transformation image handles a specific input/output format pair. Use `GET /api/v1/transformations/images` to retrieve the current list of available images and their versions at runtime.
Each transformation image defines what input formats it accepts and what ODM objects it produces. Use `GET /api/v1/transformations/images` to retrieve the current list of available images and their versions.

}
```

For the full how-to, see [csv-to-tsv/how-to-transform-csv-to-tsv.md](csv-to-tsv/how-to-transform-csv-to-tsv.md).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For the full how-to, see [csv-to-tsv/how-to-transform-csv-to-tsv.md](csv-to-tsv/how-to-transform-csv-to-tsv.md).
For the full how-to, see [CSV to Sample Group](csv-to-tsv/how-to-transform-csv-to-tsv.md).


**Available versions:** `latest`

**Use case:** Converts a CSV file attached to a study into an ODM Sample metadata group. The configuration `data` field specifies the source format and the destination entity type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's review if configuration can affect anything for the metadata-basic image.


**Default volume:** `5Gi`

**Available versions:** `latest`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to remove the field, it does not provide any details or helpful information.


**Default volume:** `5Gi`

**Available versions:** `latest`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to remove the field, it does not provide any details or helpful information.

## Version conventions

- `latest` is an alias for the most recent stable version of an image. Use it for exploration and development.
- Specific tags (for example, `0.0.7`) pin the job to a particular image version. Use them in production pipelines where reproducibility matters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Specific tags (for example, `0.0.7`) pin the job to a particular image version. Use them in production pipelines where reproducibility matters.
- Specific tags (for example, `0.0.7`) pin the job to a particular image version. Use them when you need to reproduce a previous result - the exact image and version used in any job are recorded in its logs.

Comment on lines +65 to +67
## Known limitations

Only one transformation process can be run per attachment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limitation was removed by recent update to ODM multipart endpoints and can be removed https://genestack.atlassian.net/browse/ODM-13307


**Input formats:** H5AD (AnnData), 10x Genomics H5 (converted internally to H5AD before processing), Legacy 10x Genomics H5 v<3 (single-genome only; multi-genome legacy files are not supported).

**Output formats:** ODM Cell Group, Expression Group, and attachments, with optional Sample, Library, and Preparation groups.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Output formats:** ODM Cell Group, Expression Group, and attachments, with optional Sample, Library, and Preparation groups.
**Output formats:** ODM Cell Group, Expression Group, with optional Sample, Library, and Preparation groups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants