Guidelines for large sample sizes

Do you have updated guidance for handling large datasets (>100 samples and/or >500,000 cells)?

A previous issue addressed this topic nearly 4 years ago (https://github.com/MarioniLab/miloR/issues/108), but I'm hoping you have more insights now. The clearest recommendation from that discussion was: "for very large datasets with many samples, use large k~[50, 100] and small prop~[0.01, 0.1] to reduce neighborhood redundancy."

The main concerns are:

- Sample representation: How do I ensure each neighborhood captures enough cells from each sample? Should `k` scale with sample size?
- Computational constraints: Does adjusting `prop` address memory/computation limits, or are other strategies needed?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Guidelines for large sample sizes #378

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Guidelines for large sample sizes #378

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions