Skip to content

bug fixed when no data is available for mzml statistics #665

Merged
ypriverol merged 10 commits into
mainfrom
dev
Apr 14, 2026
Merged

bug fixed when no data is available for mzml statistics #665
ypriverol merged 10 commits into
mainfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 14, 2026

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test addition/update
  • Updates to the dependencies has been done.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Added comprehensive reference documentation with installation, usage examples, configuration details, and troubleshooting guides for the plugin.
  • Bug Fixes

    • Fixed data processing logic to prevent missing entries in result tables during file parsing.
  • SEO & Analytics

    • Added search engine crawling rules and social media metadata for improved discoverability.
    • Enabled analytics tracking.
  • Tests

    • Expanded test coverage for data parsing scenarios.

ypriverol and others added 10 commits April 12, 2026 16:58
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ne discoverability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When quantms runs with `mzml_statistics = false`, no mzML or ms_info
parquet files are produced, so `mzml_table` is empty when idXML parsing
begins. Two crashes resulted:

- `KeyError` in `IdXMLReader._parse_search_file` when writing search
  engine counts into `mzml_table[ms_name]` for a key that was never
  initialized.
- Potential `KeyError` in `QuantMSModule.parse_idxml` when iterating
  `exp_design_runs` and accessing `mzml_table[spectrum_name]`.

Fix by calling `mzml_table.setdefault(ms_name, {})` before any write in
`_parse_search_file`, and guarding the `exp_design_runs` loop with an
`if spectrum_name in mzml_table` check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Handle the case where exp_design_runs is None (neither experimental
  design nor SDRF file found): fall back to iterating mzml_table.keys()
  to preserve run order instead of crashing with TypeError.
- Add regression tests for IdXMLReader with an initially-empty
  mzml_table (covers both Comet and MSGF idXML fixtures) to prevent
  the KeyError from reappearing.

Addresses Copilot review comments on PR #664.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…zml-files

fix: handle empty mzml_table when mzml_statistics is disabled
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

PR adds LLM-accessible documentation files (llms.txt, llms-full.txt, robots.txt), updates site metadata in HTML template and MkDocs configuration with Google Analytics, enhances project metadata in pyproject.toml, and fixes idXML parsing logic to properly initialize mzml_table and handle optional experimental design run ordering.

Changes

Cohort / File(s) Summary
Documentation for LLM agents
docs/llms.txt, docs/llms-full.txt, docs/robots.txt
Adds new documentation files describing pmultiqc tool overview, installation, capabilities, and detailed reference; includes robots.txt with crawling rules and explicit allow directives for LLM/web-crawler agents to access /llms.txt and /llms-full.txt.
Site configuration and metadata
mkdocs.yml, pyproject.toml, docs/overrides/main.html
Updates MkDocs with Google Analytics tracking, enhances project URLs and keywords in pyproject.toml (adds Homepage, Documentation, Repository, Bug Tracker, Changelog), and injects Open Graph image meta tag and JSON-LD SoftwareApplication schema in HTML template footer.
Core idXML parsing logic
pmultiqc/modules/common/ms/idxml.py, pmultiqc/modules/quantms/quantms.py
Fixes mzml_table initialization by calling setdefault() before engine-slot assignment in IdXMLReader; updates QuantMSModule.parse_idxml() to conditionally handle optional exp_design_runs when None, deriving spectrum_names from mzml_table.keys() as fallback.
Test coverage
tests/test_quantms.py
Adds pytest dependency and parametrized regression test test_idxml_reader_empty_mzml_table that validates mzml_table population from empty state, engine-key presence, and num_quant_psms/num_quant_peps fields existence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

review-effort-3, documentation, bug-fix

Suggested reviewers

  • yueqixuan

Poem

🐰 Hop-hop, the docs take flight,
LLM agents read tonight!
Tables heal, flows align,
pmultiqc shines so fine! ✨
Review with care, all's right!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title correctly identifies the main bug fix: preventing errors when mzml_table data is unavailable or empty, which is the primary issue resolved by the commit changes across idxml.py, quantms.py, and new regression tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 4 high · 4 minor

Alerts:
⚠ 8 issues (≤ 0 issues of at least minor severity)

Results:
8 new issues

Category Results
Documentation 4 minor
Security 4 high

View in Codacy

🟢 Metrics 0 complexity

Metric Results
Complexity 0

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_quantms.py (1)

41-62: Consider adding one module-level regression for run ordering fallback.

A focused test on QuantMSModule.parse_idxml() with exp_design_runs=None and with explicit run ordering would lock in the behavior introduced in this PR at the module integration level.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_quantms.py` around lines 41 - 62, Add a module-level regression
test that exercises QuantMSModule.parse_idxml() when exp_design_runs is None but
an explicit run ordering is provided, to ensure the fallback ordering behavior
is preserved; create a new test (similar location to
test_idxml_reader_empty_mzml_table) that constructs a QuantMSModule instance
with exp_design_runs=None and a controlled set of input idXML/paths or fixtures
that specify run order, call parse_idxml() (or the public entry that triggers
it), and assert that the resulting run list/order matches the explicit ordering
and that required keys (e.g., expected_engine_key, "num_quant_psms",
"num_quant_peps") are present for each run; reference the
QuantMSModule.parse_idxml symbol to locate the implementation and reuse or adapt
the idxml fixtures used by test_idxml_reader_empty_mzml_table.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/overrides/main.html`:
- Around line 3-29: The overridden Jinja block extrahead is replacing the parent
theme's head content; update the template to call {{ super() }} inside the {%
block extrahead %} so inherited analytics/metadata and other head scripts are
preserved (locate the {% block extrahead %} in the overrides/main.html and
insert a call to super() before or after your custom meta/script content).

In `@pmultiqc/modules/quantms/quantms.py`:
- Around line 1338-1343: The loop that reassigns keys (using spectrum_names from
self.exp_design_runs or mzml_table.keys()) doesn't reorder an OrderedDict when
it's the same instance; instead, create a new OrderedDict, iterate
spectrum_names and, for each name present in the incoming mzml_table, copy the
entry into the new OrderedDict, then set self.mzml_table to that new OrderedDict
(ensuring you reference mzml_table, self.mzml_table, and self.exp_design_runs in
the change).

---

Nitpick comments:
In `@tests/test_quantms.py`:
- Around line 41-62: Add a module-level regression test that exercises
QuantMSModule.parse_idxml() when exp_design_runs is None but an explicit run
ordering is provided, to ensure the fallback ordering behavior is preserved;
create a new test (similar location to test_idxml_reader_empty_mzml_table) that
constructs a QuantMSModule instance with exp_design_runs=None and a controlled
set of input idXML/paths or fixtures that specify run order, call parse_idxml()
(or the public entry that triggers it), and assert that the resulting run
list/order matches the explicit ordering and that required keys (e.g.,
expected_engine_key, "num_quant_psms", "num_quant_peps") are present for each
run; reference the QuantMSModule.parse_idxml symbol to locate the implementation
and reuse or adapt the idxml fixtures used by
test_idxml_reader_empty_mzml_table.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 25df8771-f781-486e-8a10-df009e306573

📥 Commits

Reviewing files that changed from the base of the PR and between f69a297 and cc7d222.

⛔ Files ignored due to path filters (1)
  • docs/images/social-card.png is excluded by !**/*.png
📒 Files selected for processing (9)
  • docs/llms-full.txt
  • docs/llms.txt
  • docs/overrides/main.html
  • docs/robots.txt
  • mkdocs.yml
  • pmultiqc/modules/common/ms/idxml.py
  • pmultiqc/modules/quantms/quantms.py
  • pyproject.toml
  • tests/test_quantms.py

Comment thread docs/overrides/main.html
Comment on lines +3 to +29
{% block extrahead %}
<meta property="og:image" content="https://pmultiqc.quantms.org/images/social-card.png">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "pmultiqc",
"description": "MultiQC plugin for proteomics quality control. Generates interactive HTML reports from quantms and quantmsdiann pipeline output.",
"applicationCategory": "Scientific Software",
"applicationSubCategory": "Proteomics",
"operatingSystem": "Linux, macOS, Windows",
"url": "https://pmultiqc.quantms.org",
"downloadUrl": "https://pypi.org/project/pmultiqc/",
"license": "https://opensource.org/licenses/MIT",
"author": {
"@type": "Organization",
"name": "BigBio",
"url": "https://quantms.org"
},
"citation": {
"@type": "ScholarlyArticle",
"name": "pmultiqc: An open-source, lightweight, and metadata-oriented QC reporting library for MS proteomics",
"url": "https://doi.org/10.1016/j.mcpro.2026.101530"
}
}
</script>
{% endblock %}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For MkDocs Material template overrides, should {% block extrahead %}include{{ super() }} to preserve parent-provided head metadata/scripts (including analytics integrations)?

💡 Result:

Yes. When overriding Material for MkDocs template blocks (including extrahead), include {{ super }} inside your {% block extrahead %} if you want to preserve any parent-provided head metadata/scripts; otherwise you may replace the block content entirely and accidentally drop integrations. Material’s own customization docs explicitly recommend using {{ super }} when you “intend to add something to a block rather than to replace it altogether,” and they list extrahead as an overrideable block. MkDocs’ theme customization docs also show that overriding a block without calling super replaces the parent block content, so super is the mechanism to retain it (e.g., for additional scripts).

Citations:


Add {{ super() }} to preserve parent theme integrations.

Overriding {% block extrahead %} at line 3 without {{ super() }} will drop parent theme integrations (analytics, metadata scripts, and other head content). Material for MkDocs requires {{ super() }} to retain inherited head content.

🔧 Proposed fix
 {% block extrahead %}
+{{ super() }}
 <meta property="og:image" content="https://pmultiqc.quantms.org/images/social-card.png">
 <script type="application/ld+json">
 {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% block extrahead %}
<meta property="og:image" content="https://pmultiqc.quantms.org/images/social-card.png">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "pmultiqc",
"description": "MultiQC plugin for proteomics quality control. Generates interactive HTML reports from quantms and quantmsdiann pipeline output.",
"applicationCategory": "Scientific Software",
"applicationSubCategory": "Proteomics",
"operatingSystem": "Linux, macOS, Windows",
"url": "https://pmultiqc.quantms.org",
"downloadUrl": "https://pypi.org/project/pmultiqc/",
"license": "https://opensource.org/licenses/MIT",
"author": {
"@type": "Organization",
"name": "BigBio",
"url": "https://quantms.org"
},
"citation": {
"@type": "ScholarlyArticle",
"name": "pmultiqc: An open-source, lightweight, and metadata-oriented QC reporting library for MS proteomics",
"url": "https://doi.org/10.1016/j.mcpro.2026.101530"
}
}
</script>
{% endblock %}
{% block extrahead %}
{{ super() }}
<meta property="og:image" content="https://pmultiqc.quantms.org/images/social-card.png">
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "pmultiqc",
"description": "MultiQC plugin for proteomics quality control. Generates interactive HTML reports from quantms and quantmsdiann pipeline output.",
"applicationCategory": "Scientific Software",
"applicationSubCategory": "Proteomics",
"operatingSystem": "Linux, macOS, Windows",
"url": "https://pmultiqc.quantms.org",
"downloadUrl": "https://pypi.org/project/pmultiqc/",
"license": "https://opensource.org/licenses/MIT",
"author": {
"@type": "Organization",
"name": "BigBio",
"url": "https://quantms.org"
},
"citation": {
"@type": "ScholarlyArticle",
"name": "pmultiqc: An open-source, lightweight, and metadata-oriented QC reporting library for MS proteomics",
"url": "https://doi.org/10.1016/j.mcpro.2026.101530"
}
}
</script>
{% endblock %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/overrides/main.html` around lines 3 - 29, The overridden Jinja block
extrahead is replacing the parent theme's head content; update the template to
call {{ super() }} inside the {% block extrahead %} so inherited
analytics/metadata and other head scripts are preserved (locate the {% block
extrahead %} in the overrides/main.html and insert a call to super() before or
after your custom meta/script content).

Comment on lines +1338 to +1343
# mass spectrum files sorted based on experimental file when available;
# otherwise preserve the mzml_table iteration order.
spectrum_names = self.exp_design_runs if self.exp_design_runs is not None else mzml_table.keys()
for spectrum_name in spectrum_names:
if spectrum_name in mzml_table:
self.mzml_table[spectrum_name] = mzml_table[spectrum_name]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Verify call site passes self.mzml_table into parse_idxml:"
rg -n -C2 'parse_idxml\(self\.mzml_table\)' pmultiqc/modules/quantms/quantms.py

echo
echo "2) Verify current reindex block:"
rg -n -C4 'spectrum_names = self\.exp_design_runs if self\.exp_design_runs is not None else mzml_table\.keys\(\)' pmultiqc/modules/quantms/quantms.py

echo
echo "3) Demonstrate reassignment does not reorder keys:"
python - <<'PY'
from collections import OrderedDict
od = OrderedDict([("run_b", {}), ("run_a", {})])
for k in ["run_a", "run_b"]:
    od[k] = od[k]
print("Observed order:", list(od.keys()))
print("Expected if reordered:", ["run_a", "run_b"])
PY

Repository: bigbio/pmultiqc

Length of output: 963


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1) Find method containing lines 1338-1343 ==="
sed -n '1320,1345p' pmultiqc/modules/quantms/quantms.py | head -30

echo
echo "=== 2) Find method definition context ==="
sed -n '1300,1350p' pmultiqc/modules/quantms/quantms.py | grep -E '^\s*(def |class )' -A1

echo
echo "=== 3) Check what data type mzml_table is initialized as ==="
rg -n 'self\.mzml_table\s*=' pmultiqc/modules/quantms/quantms.py | head -5

echo
echo "=== 4) Check imports to see if OrderedDict is used ==="
head -50 pmultiqc/modules/quantms/quantms.py | grep -i 'import\|from'

echo
echo "=== 5) Full context of lines 1330-1350 ==="
sed -n '1330,1350p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 3308


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check if mzml_table is populated before parse_idxml call ==="
grep -n 'self\.mzml_table\[' pmultiqc/modules/quantms/quantms.py | head -10

echo
echo "=== Confirm alias issue: parameter receives self.mzml_table ==="
sed -n '395,410p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 814


Reordering via key reassignment does not work on OrderedDict.

At line 1343, reassigning existing keys on self.mzml_table will not reorder them in insertion order. Since mzml_table is the same OrderedDict instance passed as a parameter, the intended reordering based on exp_design_runs is lost. This leaves the runs in their original order regardless of the loop.

🔧 Proposed fix (build a new OrderedDict in desired order)
-        # mass spectrum files sorted based on experimental file when available;
-        # otherwise preserve the mzml_table iteration order.
-        spectrum_names = self.exp_design_runs if self.exp_design_runs is not None else mzml_table.keys()
-        for spectrum_name in spectrum_names:
-            if spectrum_name in mzml_table:
-                self.mzml_table[spectrum_name] = mzml_table[spectrum_name]
+        # mass spectrum files sorted based on experimental file when available;
+        # otherwise preserve the mzml_table iteration order.
+        spectrum_names = self.exp_design_runs if self.exp_design_runs is not None else mzml_table.keys()
+        reordered = OrderedDict()
+        for spectrum_name in spectrum_names:
+            if spectrum_name in mzml_table:
+                reordered[spectrum_name] = mzml_table[spectrum_name]
+        # keep any runs not present in exp_design_runs at the end
+        for spectrum_name, run_data in mzml_table.items():
+            reordered.setdefault(spectrum_name, run_data)
+        self.mzml_table = reordered
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pmultiqc/modules/quantms/quantms.py` around lines 1338 - 1343, The loop that
reassigns keys (using spectrum_names from self.exp_design_runs or
mzml_table.keys()) doesn't reorder an OrderedDict when it's the same instance;
instead, create a new OrderedDict, iterate spectrum_names and, for each name
present in the incoming mzml_table, copy the entry into the new OrderedDict,
then set self.mzml_table to that new OrderedDict (ensuring you reference
mzml_table, self.mzml_table, and self.exp_design_runs in the change).

@ypriverol ypriverol merged commit 1ef58e4 into main Apr 14, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant