Skip to content

[Feature] datasets fs: inspect and download files under a task (ls/get/download) #1104

Description

@xdlkc

Feature Category

  • Sandbox
  • Actions
  • Deployments
  • SDK & API
  • Envhub
  • CLI
  • Performance & Optimization
  • Documentation & Examples

Problem Statement

There is currently no way to inspect or fetch individual files inside a single
task from the OSS dataset registry. Users can list task IDs (datasets tasks)
but to look at or download a task's contents they must reach for raw ossutil
and reconstruct the datasets/{org}/{dataset}/{split}/{task}/ key layout by
hand. Two concrete gaps:

  1. No file listing per task: no command answers "what files does task X
    contain?"
  2. No file read / download: no command prints a single task file to stdout
    or downloads a task file/directory to a local path.

Proposed Solution

Add a datasets fs subcommand group (alias files) with three operations,
backed by new registry/client methods.

  1. Registry layer (OssDatasetRegistry / BaseDatasetRegistry /
    DatasetClient):

    • list_task_files(org, dataset, split, task_id, path="") -> list[TaskFile]
      — list files under a task, paths relative to the task root.
    • get_task_file(org, dataset, split, task_id, path) -> bytes | None
      — read one task file by relative path; None when the object is absent.
    • New TaskFile dataclass (path: str, size: int | None).
    • Support both layouts: directory-style tasks ({task}/...) and
      single-file tasks ({task}.json directly under the split).
  2. CLI layer (datasets fs ...):

    • datasets fs ls --org --dataset --split --task [--path] — list files.
    • datasets fs get --org --dataset --split --task [--path] — print one
      file to stdout (resolves the only file automatically when the task has
      exactly one).
    • datasets fs download --org --dataset --split --task --path --dest
      download a single file or a whole directory subtree to a local path.
    • JSON output (-o json) for ls/get/download.
  3. Path safety: relative task paths are normalized and validated — reject
    absolute paths and .. traversal, drop ./empty segments — so a crafted
    --path cannot escape the task root.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions