Skip to content

modelGeneVar() extremely slow with HDF5-backed SingleCellExperiment #124

Description

@keyingkuang

Hi scran team,

Thanks for the great package! I’m encountering a performance issue when using modelGeneVar() on a large HDF5-backed SingleCellExperiment.

Previously, I used:
variable_genes <- scran::modelGeneVar(sce_object)
This completed in around 10 hours on the same large, HDF5-backed dataset. However, running the same call now takes several days without completing.

Here’s how we construct the object:
log_object <- HDF5Array::H5ADMatrix(raw_file_path, "data") count_object <- HDF5Array::H5ADMatrix(raw_file_path, "counts") sce_object <- SingleCellExperiment::SingleCellExperiment( assays = list(counts = count_object, logcounts = log_object) )

It seems that modelGeneVar() might not be fully optimized for delayed operations or is trying to load the entire matrix into memory. However, I don’t see an explicit memory error.

Do you have any suggestions on how to speed this up or safely apply modelGeneVar() to HDF5-backed data?

Thanks,
Maggie

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions