Skip to content

Titration analysis with CPJUMP1 profiles #5

Open
axiomcura wants to merge 7 commits into
WayScience:mainfrom
axiomcura:titration-analysis
Open

Titration analysis with CPJUMP1 profiles #5
axiomcura wants to merge 7 commits into
WayScience:mainfrom
axiomcura:titration-analysis

Conversation

@axiomcura

Copy link
Copy Markdown
Member

This pull request introduces a titration analysis for the top 3 compounds across both cell types in the cpjump1 analysis module.

Key updates include:

  • Added a new Jupyter notebook (5.cpjump1-titration-analysis.ipynb) and its corresponding Python script.
  • Updated the main analysis shell script to include the titration analysis pipeline.
  • Added necessary project dependencies to pyproject.toml.
  • Included output results (parquet and plot files) for the titration analysis.
  • Updated pre-commit configurations and lock files to support the new workflow.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CPJUMP1 titration analysis to quantify how Buscar score stability changes as the number of pooled perturbed cells decreases, and wires it into the CPJUMP1 analysis workflow.

Changes:

  • Added a new CPJUMP1 titration analysis notebook and its nbconverted Python script, producing parquet + plot outputs.
  • Updated the CPJUMP1 runner shell script to execute the new titration step.
  • Added dependencies (e.g., pycytominer, requests, tqdm) and updated lock/pre-commit configuration.

Reviewed changes

Copilot reviewed 5 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
uv.lock Locks new dependencies needed for titration analysis and related tooling.
pyproject.toml Adds build-system config and new runtime dependencies for the analysis workflow.
notebooks/3.cpjump1-analysis/run-cpjump1-buscar-analysus.sh Adds execution of the titration analysis script (but currently references incorrect script filenames in the pipeline).
notebooks/3.cpjump1-analysis/nbconverted/5.cpjump1-titration-analysis.py Implements the titration analysis pipeline and writes results/plots (but currently has import/path and avoidable per-iteration I/O issues).
notebooks/3.cpjump1-analysis/5.cpjump1-titration-analysis.ipynb Source notebook for the titration analysis.
.pre-commit-config.yaml Updates Ruff pre-commit revision.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread notebooks/3.cpjump1-analysis/run-cpjump1-buscar-analysus.sh
Comment thread notebooks/3.cpjump1-analysis/nbconverted/5.cpjump1-titration-analysis.py Outdated
Comment thread notebooks/3.cpjump1-analysis/nbconverted/5.cpjump1-titration-analysis.py Outdated

@wli51 wli51 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't think there are any fatal flaws with this experiment but maybe the analysis can be made more robust with sampling with replacement, see comments for details.

.mean()
.alias("mean_abs_score_error_from_1"),
)
.sort(["mean_abs_score_error_from_1", "var_on_score"])

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the ranking method here and I think it makes sense. I have one concern being that if two perturbations have respective mean error from 1 of 0.001 and 0.002 but the one with less err had an insanely huge variance compared to the second it would still rank first which I guess is what we don't want.

Maybe the variance should be normalized by error to indicate true consistency?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way, i think you have to show the actual rank df and see the real values of variance of top ranked compounds. currently your notebook is not ran and that can't be seen



# setting random seed for reproducibility
np.random.seed(rng_seed)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a standalone random state e.g. rng = np.random.RandomState(rng_seed) and calling rng.choice when you need to sample is always better than relying with global np random

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Cameron were to review this PR he would suggest hasjing something like the cell line name or some CPJUMP1 experiment specific ID to get a always reproducible seed to use as rng for both cell types.

Comment on lines +237 to +241
selected_plate_id = np.random.choice(plate_ids)

for treatment in tqdm(
selected_treatments, desc=f"{cell_type} treatments", unit="treatment"
):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever want k fold we should just exhuast over all plate_ids, setting the one as reference and all others as titration sample pool. maybe ask greg if he thinks kfold titration would worth the effort?

).sample(
fraction=negcon_subsample_fraction,
seed=iter_seed,
with_replacement=False,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for smaller number of single cell i think with replacement will probably make the titration readouts more realistic, because with small sample sizes at larger titration proportions you essentially get the exact same sample, which can cause artificial stability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants