Titration analysis with CPJUMP1 profiles #5
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Pull request overview
This PR adds a CPJUMP1 titration analysis to quantify how Buscar score stability changes as the number of pooled perturbed cells decreases, and wires it into the CPJUMP1 analysis workflow.
Changes:
- Added a new CPJUMP1 titration analysis notebook and its nbconverted Python script, producing parquet + plot outputs.
- Updated the CPJUMP1 runner shell script to execute the new titration step.
- Added dependencies (e.g.,
pycytominer,requests,tqdm) and updated lock/pre-commit configuration.
Reviewed changes
Copilot reviewed 5 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Locks new dependencies needed for titration analysis and related tooling. |
pyproject.toml |
Adds build-system config and new runtime dependencies for the analysis workflow. |
notebooks/3.cpjump1-analysis/run-cpjump1-buscar-analysus.sh |
Adds execution of the titration analysis script (but currently references incorrect script filenames in the pipeline). |
notebooks/3.cpjump1-analysis/nbconverted/5.cpjump1-titration-analysis.py |
Implements the titration analysis pipeline and writes results/plots (but currently has import/path and avoidable per-iteration I/O issues). |
notebooks/3.cpjump1-analysis/5.cpjump1-titration-analysis.ipynb |
Source notebook for the titration analysis. |
.pre-commit-config.yaml |
Updates Ruff pre-commit revision. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wli51
left a comment
There was a problem hiding this comment.
LGTM! I don't think there are any fatal flaws with this experiment but maybe the analysis can be made more robust with sampling with replacement, see comments for details.
| .mean() | ||
| .alias("mean_abs_score_error_from_1"), | ||
| ) | ||
| .sort(["mean_abs_score_error_from_1", "var_on_score"]) |
There was a problem hiding this comment.
i like the ranking method here and I think it makes sense. I have one concern being that if two perturbations have respective mean error from 1 of 0.001 and 0.002 but the one with less err had an insanely huge variance compared to the second it would still rank first which I guess is what we don't want.
Maybe the variance should be normalized by error to indicate true consistency?
There was a problem hiding this comment.
either way, i think you have to show the actual rank df and see the real values of variance of top ranked compounds. currently your notebook is not ran and that can't be seen
|
|
||
|
|
||
| # setting random seed for reproducibility | ||
| np.random.seed(rng_seed) |
There was a problem hiding this comment.
having a standalone random state e.g. rng = np.random.RandomState(rng_seed) and calling rng.choice when you need to sample is always better than relying with global np random
There was a problem hiding this comment.
If Cameron were to review this PR he would suggest hasjing something like the cell line name or some CPJUMP1 experiment specific ID to get a always reproducible seed to use as rng for both cell types.
| selected_plate_id = np.random.choice(plate_ids) | ||
|
|
||
| for treatment in tqdm( | ||
| selected_treatments, desc=f"{cell_type} treatments", unit="treatment" | ||
| ): |
There was a problem hiding this comment.
If we ever want k fold we should just exhuast over all plate_ids, setting the one as reference and all others as titration sample pool. maybe ask greg if he thinks kfold titration would worth the effort?
| ).sample( | ||
| fraction=negcon_subsample_fraction, | ||
| seed=iter_seed, | ||
| with_replacement=False, |
There was a problem hiding this comment.
for smaller number of single cell i think with replacement will probably make the titration readouts more realistic, because with small sample sizes at larger titration proportions you essentially get the exact same sample, which can cause artificial stability.
This pull request introduces a titration analysis for the top 3 compounds across both cell types in the cpjump1 analysis module.
Key updates include:
5.cpjump1-titration-analysis.ipynb) and its corresponding Python script.pyproject.toml.