feat: improve scoring workflows for large OSW datasets by singjc · Pull Request #212 · PyProphet/pyprophet

singjc · 2026-06-17T23:31:06Z

This pull request introduces several new configuration options and command-line arguments to enhance the flexibility and control of the scoring process in pyprophet. The main themes are the addition of experimental features for transition scoring and training, improved report generation controls, and better batch processing options. The changes are reflected throughout the configuration, CLI, and reporting code.

Key changes:

Transition scoring and training enhancements

Added new experimental options to control transition scoring features: transition_score_use_mapping_cardinality, transition_score_use_unique_mapping, and transition_score_use_phospho_loss. These allow exposing additional features for transition scoring. [1] [2] [3]
Added options to restrict which transitions are used for semi-supervised training: transition_training_require_unique_mapping, transition_training_require_phospho_loss, transition_training_max_isotope_overlap, and transition_training_min_log_sn. [1] [2] [3]
All new options are exposed via the CLI and integrated into the configuration and argument parsing logic. [1] [2] [3] [4]

Report generation improvements

Introduced a report_mode option (with choices: 'auto', 'full', 'main', 'none') to control the scope of the PDF report. 'auto' selects 'main' for large experiments, and 'full' otherwise. 'none' disables report generation. [1] [2] [3] [4] [5] [6]
The CLI and config now handle report_mode, and the report-writing logic respects this setting, skipping report generation if set to 'none'. [1] [2]

Batch processing and filtering

Added apply_weights_run_batch_size to control how many runs are processed together when applying weights, with CLI and config support. [1] [2] [3] [4]
Added run_id_filter to RunnerIOConfig to allow filtering by run ID, and integrated it into the argument parsing and config serialization. [1] [2]

Miscellaneous

Improved string representations (__str__, __repr__) of the config classes to include all new options for easier debugging and logging. [1] [2] [3]
Added a save_scorer method stub to the IO base class for future extensibility.

These changes provide more granular control over transition scoring and training, allow for more efficient processing of large experiments, and give users flexibility in report generation and batch processing.

Copilot

Pull request overview

This PR adds new scoring/training configuration knobs aimed at scaling pyprophet score to large OSW datasets, including persisted-scorer streaming apply, run-level filtering, and report generation controls.

Changes:

Added persisted scorer support for OSW workflows and a streamed apply-weights path to reduce memory usage on large multi-run OSW files.
Introduced report_mode (auto|full|main|none) and apply_weights_run_batch_size to control report scope and streamed apply batching.
Added experimental transition scoring/training feature flags and run_id_filter support across config, readers, and tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_pyprophet_score.py	Adds an integration test for persisted-scorer streaming apply with `report_mode=main`.
tests/test_io_scoring.py	Adds tests validating `run_id_filter` behavior for OSW reader subsets.
pyprophet/scoring/semi_supervised.py	Adds transition-training target filters and refactors score-alias handling.
pyprophet/scoring/runner.py	Implements streamed OSW apply using a persisted scorer; defers reader loading for apply path.
pyprophet/scoring/pyprophet.py	Adds compact error-stat lookup for persisted scorers and adjusts pickling payload.
pyprophet/scoring/data_handling.py	Introduces `get_score_alias_columns`, preserves `meta_*` columns, and updates feature-matrix selection.
pyprophet/report.py	Adds `report_mode` support to skip report generation or omit downstream pages.
pyprophet/io/scoring/tsv.py	Respects `report_mode` when deciding whether to generate a PDF report.
pyprophet/io/scoring/split_parquet.py	Adds optional transition metadata/features for mapping/phospho-loss and training filters.
pyprophet/io/scoring/parquet.py	Adds optional transition metadata/features for mapping/phospho-loss and training filters.
pyprophet/io/scoring/osw.py	Adds `run_id_filter` support and transition metadata/features; adds incremental score writing and scorer persistence.
pyprophet/io/_base.py	Adds a no-op `save_scorer` hook and ensures writers respect `report_mode=none`.
pyprophet/cli/score.py	Adds CLI options for new transition flags, `report_mode`, and streamed apply batching; implements auto report selection for large experiments.
pyprophet/_config.py	Extends config dataclasses/serialization to include new flags, `report_mode`, batch size, and `run_id_filter`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

        self._create_indexes()
+        if getattr(self.config, "run_id_filter", None) is not None:
+            logger.info(
+                "Using SQLite read path for run-scoped OSW access."
+            )
+            con = sqlite3.connect(self.infile)
+            return self._read_using_sqlite(con)


+            con = sqlite3.connect(apply_weights)
            if self.classifier in ("LDA", "SVM"):
                try:
-                    con = sqlite3.connect(apply_weights)
-
                    if not check_sqlite_table(con, "PYPROPHET_WEIGHTS"):
                        raise click.ClickException(


…xclusion of generated docs

… into split/scoring-large-osw

… output files

…ests - Updated expected output values in regression tests for multi-split parquet and TSV formats to reflect recent changes in scoring calculations. - Adjusted the float stabilization logic in the `_stabilize_regtest_float` function to clamp values greater than or equal to 1 to three decimal places, ensuring consistent results across different environments while maintaining higher precision for sub-unit scores.

feat: improve scoring workflows for large OSW datasets

4be50cc

Copilot AI review requested due to automatic review settings June 17, 2026 23:31

Merge branch 'master' into split/scoring-large-osw

6a13eb6

singjc enabled auto-merge June 17, 2026 23:31

Copilot started reviewing on behalf of singjc June 17, 2026 23:31 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

singjc and others added 8 commits June 17, 2026 20:16

fix: update .gitignore to include tools directory and ensure proper e…

8733442

…xclusion of generated docs

Merge branch 'split/scoring-large-osw' of github.com:singjc/pyprophet…

369290f

… into split/scoring-large-osw

Merge branch 'master' into split/scoring-large-osw

3d2495d

Fix deterministic OSW export ordering and close SQLite connections

b8e045c

Merge branch 'split/scoring-large-osw' of github.com:singjc/pyprophet…

9649d53

… into split/scoring-large-osw

fix: improve float stabilization for deterministic testing and update…

5d4de90

… output files

Merge branch 'master' into split/scoring-large-osw

1694ae0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve scoring workflows for large OSW datasets#212

feat: improve scoring workflows for large OSW datasets#212
singjc wants to merge 10 commits into
PyProphet:masterfrom
singjc:split/scoring-large-osw

singjc commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singjc commented Jun 17, 2026

Transition scoring and training enhancements

Report generation improvements

Batch processing and filtering

Miscellaneous

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants