Skip to content

Guidelines for large sample sizes #378

Description

@igordot

Do you have updated guidance for handling large datasets (>100 samples and/or >500,000 cells)?

A previous issue addressed this topic nearly 4 years ago (#108), but I'm hoping you have more insights now. The clearest recommendation from that discussion was: "for very large datasets with many samples, use large k~[50, 100] and small prop~[0.01, 0.1] to reduce neighborhood redundancy."

The main concerns are:

  • Sample representation: How do I ensure each neighborhood captures enough cells from each sample? Should k scale with sample size?
  • Computational constraints: Does adjusting prop address memory/computation limits, or are other strategies needed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions