feat(workstream-c): cheatsheet categorization and grouping by shreeshtripurwarcomp23-coder · Pull Request #932 · OWASP/OpenCRE

shreeshtripurwarcomp23-coder · 2026-06-14T17:45:20Z

Workstream C — Categorization and Optional Grouping

Closes Issue C from the RFC: Autonomous LLM Pipeline for OWASP Cheat Sheet to CRE Mapping

What this PR delivers

categorize_cheatsheet(record) — labels cheat sheets using a 29-label controlled taxonomy via deterministic keyword matching
group_cheatsheets(records) — groups cheat sheets by category with stable sha256-based group IDs
LLM-optional path with full fallback to deterministic mode on any failure
50 tests covering all 5 acceptance criteria from RFC Issue C

Checklist

C1 — Taxonomy defined (29 controlled labels, single source of truth)
C2 — Deterministic rule-based categorizer
C3 — Grouping with stable IDs across repeated runs
C4 — LLM-optional path with safe fallback
C5 — 50 tests, all passing

Acceptance criteria met

Labels only from approved taxonomy
Same input → same output (deterministic)
Group IDs stable across runs
Unknown inputs → uncategorized, no crash
3+ categories + unknown case covered in tests

Note

CheatsheetRecord is currently a local stub matching the RFC contract exactly. Will be replaced with Workstream B's import once their PR merges — one line change, no logic affected.

…nked (issue OWASP#486)

- Implement categorize_cheatsheet() with 29-label controlled taxonomy - Implement group_cheatsheets() with stable sha256-based group IDs - Deterministic keyword/rule baseline, no LLM dependency - LLM-optional path with safe fallback on failure - 50 tests covering all acceptance criteria from RFC Issue C CheatsheetRecord uses local stub pending Workstream B merge.

coderabbitai · 2026-06-14T17:45:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Summary by CodeRabbit

New Features
- Added cheatsheet categorization and grouping system with intelligent keyword-based organization.
- Improved smartlink navigation: direct redirect to single linked resource instead of intermediate page.
Tests
- Added comprehensive test suite for cheatsheet categorization and grouping functionality.
Chores
- Enhanced telemetry data with import run records and event tracking.

Walkthrough

This PR adds a new cheatsheet_categorizer module with a controlled taxonomy, deterministic keyword-to-label matching, optional LLM-based labeling with safe fallback, and stable grouping via CheatsheetGroup. A corresponding test suite covers all code paths. Separately, the smartlink route gains a fast-path redirect to /cre/<id> when a node has exactly one CRE link.

Changes

Cheatsheet Categorizer Module

Layer / File(s)	Summary
Taxonomy, keyword rules, and data contracts `application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`	Defines module context, `TAXONOMY` and `UNCATEGORIZED` sentinel, ordered keyword-to-label rules, `CheatsheetRecord` dataclass with `__post_init__` validation, and `CheatsheetGroup` dataclass with deterministic `make_group_id` via truncated SHA-256.
`categorize_cheatsheet`, `group_cheatsheets`, and internal helpers `application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`	Implements `categorize_cheatsheet` with optional LLM invocation, label validation, and deterministic fallback; `group_cheatsheets` bucketing by `group_id` and sorting; `_build_searchable_text`, `_deterministic_categorize` keyword scan, and `_validate_labels` filtering.
Cheatsheet categorizer test suite `application/tests/test_cheatsheet_categorizer.py`	Full unit test suite covering taxonomy integrity, deterministic categorization with sorting/deduplication, `UNCATEGORIZED` fallback, all LLM-path variants with injection and fallback scenarios, `group_cheatsheets` behavior and ordering, `make_group_id` determinism and format, `_validate_labels` filtering, and `_deterministic_categorize` expectations.

Smartlink Single-CRE Redirect

Layer / File(s)	Summary
Single-CRE fast-path in smartlink route and test update `application/web/web_main.py`, `application/tests/web_main_test.py`	Inserts a redirect to `/cre/<cre_id>` in the `smartlink` handler when the resolved node has exactly one CRE link; updates `test_smartlink` to assert 302 redirects to `/cre/` pages for single-CRE cases and adds a second linked CRE fixture for the multi-CRE path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: implementing cheatsheet categorization and grouping functionality as part of Workstream C.
Description check	✅ Passed	The description thoroughly explains the PR's purpose, deliverables, acceptance criteria, and implementation details related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.4.16)

import_telemetry.json

File contains syntax errors that prevent linting: Line 2: End of file expected; Line 3: End of file expected; Line 4: End of file expected; Line 5: End of file expected; Line 6: End of file expected; Line 7: End of file expected; Line 8: End of file expected; Line 9: End of file expected; Line 10: End of file expected; Line 11: End of file expected; Line 12: End of file expected; Line 13: End of file expected; Line 14: End of file expected; Line 15: End of file expected; Line 16: End of file expected; Line 17: End of file expected; Line 18: End of file expected; Line 19: End of file expected; Line 20: End of file expected; Line 21: End of file expected; Line 22: End of file expected; Line 23: End of file expected; Line 24: End of file expected; Line 25: End of file expected

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/test_cheatsheet_categorizer.py`:
- Around line 330-344: Both test_same_category_same_group and
test_different_categories_different_groups contain assertions guarded by
conditional statements, allowing them to pass without testing anything if the
categorization unexpectedly changes. Fix this by converting the conditional
guards into direct assertions that verify the expected label relationships
first. In test_same_category_same_group, change the if statement to
self.assertEqual(auth_labels, pwd_labels) so the label equality is asserted
unconditionally, then follow with the grouping assertions. Similarly, in
test_different_categories_different_groups, change the if statement to
self.assertNotEqual(auth_labels, secrets_labels) to unconditionally assert the
labels differ, then follow with the grouping assertions. This ensures these
tests fail if categorization behavior changes unexpectedly.

In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 343-349: The loop variable `l` (lowercase letter L) on lines 343
and 346 violates Ruff E741 linting rules because it is ambiguous and looks like
the number 1. Rename this variable to a more descriptive name like `label`
throughout the code block. Replace `l` in the list comprehension (valid = [l for
l in labels...]), in the for loop declaration (for l in valid:), and in the
conditional check (if l not in seen:) and subsequent operations
(deduped.append(l) and seen.add(l)) with the new variable name to satisfy lint
requirements.
- Around line 170-189: The CheatsheetRecord dataclass documents that required
fields (source, source_id, title, hyperlink, summary, headings,
raw_markdown_path) must be non-empty strings or lists after normalization, but
this constraint is not enforced. Add a __post_init__ method to the
CheatsheetRecord dataclass that validates each required field is non-empty
(non-empty string or non-empty list), raising a descriptive ValueError with
field-level details if any required field is empty or invalid.
- Around line 209-211: The make_group_id function does not properly implement
set-based behavior because it sorts the labels list directly without removing
duplicates. When duplicate labels exist, different label sets produce different
hashes, violating the documented set-based contract. Fix this by converting the
labels parameter to a set before sorting it, so that duplicate labels are
eliminated and semantically equivalent label sets always produce the same stable
hash.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 39b63327-3bdc-452c-bd18-66396c5b7cb8

📥 Commits

Reviewing files that changed from the base of the PR and between e853cd3 and 43cd7e5.

📒 Files selected for processing (4)

application/tests/test_cheatsheet_categorizer.py
application/tests/web_main_test.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
application/web/web_main.py

coderabbitai

♻️ Duplicate comments (1)

application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (1)

194-210: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate list fields in __post_init__ to prevent categorize_cheatsheet runtime crashes.

CheatsheetRecord.__post_init__ currently guards only required string fields. If headings/category_hints contain non-strings, Line 342 (" ".join(parts)) can raise TypeError, which breaks the “unknown input should not crash” behavior.

Proposed fix

 def __post_init__(self) -> None:
@@
         for fname in required_str_fields:
             value = getattr(self, fname)
             if not isinstance(value, str) or not value.strip():
                 raise ValueError(
                     f"CheatsheetRecord.{fname} must be a non-empty string, "
                     f"got {value!r}"
                 )
+
+        required_list_fields = ["headings", "category_hints"]
+        for fname in required_list_fields:
+            value = getattr(self, fname)
+            if not isinstance(value, list):
+                raise ValueError(f"CheatsheetRecord.{fname} must be a list, got {type(value).__name__}")
+            if any(not isinstance(item, str) for item in value):
+                raise ValueError(f"CheatsheetRecord.{fname} must contain only strings")

Also applies to: 339-342

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
around lines 194 - 210, The __post_init__ method in CheatsheetRecord currently
validates only string fields but does not validate the list fields headings and
category_hints. If these list fields contain non-string values, the "
".join(parts) call in categorize_cheatsheet will raise a TypeError at runtime.
Add validation in __post_init__ to ensure that headings and category_hints are
present and contain only string elements, raising a ValueError with a
descriptive message if validation fails. This will prevent runtime crashes when
non-string values are passed in these list fields.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in CheatsheetRecord currently
validates only string fields but does not validate the list fields headings and
category_hints. If these list fields contain non-string values, the "
".join(parts) call in categorize_cheatsheet will raise a TypeError at runtime.
Add validation in __post_init__ to ensure that headings and category_hints are
present and contain only string elements, raising a ValueError with a
descriptive message if validation fails. This will prevent runtime crashes when
non-string values are passed in these list fields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0859c5cd-3886-42e4-96f8-08c6c3dfe810

📥 Commits

Reviewing files that changed from the base of the PR and between 484afac and 604e85a.

📒 Files selected for processing (2)

application/tests/test_cheatsheet_categorizer.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

🚧 Files skipped from review as they are similar to previous changes (1)

application/tests/test_cheatsheet_categorizer.py

…#901) (OWASP#909) The issue was in CommonRequirementEnumeration.tsx — the collapse logic (slicing the list after 5 items and showing a "Show more" button) was being applied to all link types including CREs, with no distinction between CRE links and links to external standards. The fix adds an allLinksAreCres check that mirrors the same pattern already correctly implemented in DocumentNode.tsx. When all links in a group are of type CRE, the full list is shown without slicing and the "Show more" button is hidden. Links to external standards continue to collapse as before. Changes are limited to a single file: CommonRequirementEnumeration.tsx Added DOCUMENT_TYPES import from ../../const Compute allLinksAreCres before rendering each link group Use visibleResults (full list for CREs, sliced for others) instead of inline slice Guard the "Show more" button with !allLinksAreCres

* feat: implement structured extraction checkpoints B1 and B2 Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com> * docs: improve formatting Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com> * fix: improve normalization of required string fields Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com> * docs: add docstrings Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com> * fix: validate normalized string field values correctly Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com> --------- Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com> Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com>

…WASP#823) * Add curated CWE fallback mappings * Cover CWE fallback and inheritance behavior with tests * Add local CWE refresh tooling * Add local helper scripts for issue OWASP#472 * Integrate OpenCRE map analysis support from issue OWASP#469 * Implement fallback for gap analysis in database with error handling * Update scripts/show-db-stats.sh Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com> * fix: remove leading space in 'xss' keyword for CWE mapping * fix: update condition for related CWE entries to check for 'ChildOf' nature * fix: correct syntax for accessing related CWE entry attributes * fix: enhance gap analysis error handling for Heroku and fallback scenarios --------- Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…nitor PR OWASP#823 reintroduced Neo4j/Redis fallback on Heroku cache misses, causing 503s when Neo4j DNS fails. Serve precomputed GA from Postgres only on Heroku and return 404 on cache miss. Add monitor_ga_health.py for production regression alerting (especially HTTP 503). Fixes OWASP#923 Co-authored-by: Cursor <cursoragent@cursor.com>

Cloudflare blocks anonymous urllib requests to ga_standards on production. Co-authored-by: Cursor <cursoragent@cursor.com>

Allow AGENTS.md through the *.md gitignore exception and document that Heroku/opencreorg gap analysis is cache-only (no compute on production). Co-authored-by: Cursor <cursoragent@cursor.com>

Guard add_gap_analysis_result so non-material {"result":{}} primary rows are not inserted and cannot overwrite material cache; subresource keys unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Supports postgres-to-postgres sync via temp-table merge for prod tables without a unique index on cache_key. Co-authored-by: Cursor <cursoragent@cursor.com>

Document operational scripts and weekly prod GA checks in AGENTS.md; add link_pci_dss_cre.py for embedding-based CRE linking. Harden primary GA cache key detection, sync script materiality guards, monitor 503 test, and DSN redaction. Co-authored-by: Cursor <cursoragent@cursor.com>

Avoid accidental production writes when running link_pci_dss_cre.py without explicit --cache-file or CRE_CACHE_FILE. Co-authored-by: Cursor <cursoragent@cursor.com>

Guard against an empty get_CREs result so callers get None instead of IndexError when a DB row exists but no matching CRE document is found. Co-authored-by: Cursor <cursoragent@cursor.com>

Serve precomputed OpenCRE GA from cache on Heroku instead of computing on the web dyno, expand backfill to include automatic CRE links, and harden PCI DSS / Secure Headers imports with better linking and parser fixes. Co-authored-by: Cursor <cursoragent@cursor.com>

Harden PCI env parsing, tighten sync script safety checks, make bridge fallback tests deterministic, and format files flagged by CI black. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Track AGENTS.md and .cursor/rules so the team shares human-plan-then-agent-execute workflows, CI/PR policies, and domain safety guardrails. Co-authored-by: Cursor <cursoragent@cursor.com>

Add modular .cursor/rules for requirements gates, tickets, TDD, and verification; tighten plan-first and multi-agent flows; slim AGENTS.md to an index aligned with make lint/mypy/test checks. Co-authored-by: Cursor <cursoragent@cursor.com>

The test expected tags="1,2" but dbNodeFromCode joins the input list ["111-111", "222-222"] with commas, producing "111-111,222-222". The expected value in the test was wrong.

Signed-off-by: Arpit Jain <arpitjain099@gmail.com>

Closes OWASP#862 request.args.get('text') returns None if the query param is absent. Passing None into db.text_search() causes re.search() to raise TypeError: expected string or bytes-like object. Return a 400 before reaching the database call.

Replace get_all_records() with get_all_values() to bypass gspread's numericise() which converts section codes like '5.10' to float 5.1. get_all_values() returns raw strings, preserving trailing zeros. Fixes OWASP#574 Fixes OWASP#546

Align spreadsheet_test with get_all_values-based read path so section codes like 5.10 stay strings instead of being float-coerced. Co-authored-by: Cursor <cursoragent@cursor.com>

Handle empty worksheets and pad short rows so section IDs are preserved as strings without IndexError or truncated dict keys.

Fail fast with GSpreadException when worksheet header row contains duplicates, use zip(strict=True) after row padding, and add regression tests for the helper and read_spreadsheet integration.

Fix production H12 timeouts from GET /all_cres?per_page=1000 by batching N+1 link hydration in the DB layer, capping per_page at 100, scoping DataProvider to Explorer routes with incremental page loads, and using ensureFullExplorerData for graph views. Closes OWASP#930. Related: OWASP#847, OWASP#848. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix buildTree sibling keyPath mutation, serialize loadPage via promise chain with exposed dataLoadError, hoist Explorer layout wrappers, surface load failures in graph views, restore viewport zoom, and harden pagination link parity test. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (2)

341-341: ⚡ Quick win

Prefer iterable unpacking for list construction.

As flagged by Ruff RUF005, iterable unpacking is more idiomatic than list concatenation in Python.

♻️ Proposed refactor

-    parts = [record.title] + record.headings + record.category_hints
+    parts = [record.title, *record.headings, *record.category_hints]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
at line 341, The list construction at the assignment to parts uses the +
operator for list concatenation, which is less idiomatic than iterable unpacking
in Python. Replace the list concatenation using + operators with iterable
unpacking syntax by converting the assignment to use a single list literal with
the record.title element followed by *record.headings and *record.category_hints
unpacking operators to combine all the elements into the parts list.

Source: Linters/SAST tools

318-318: 💤 Low value

Consider adding specific type hint for bucket.

The generic dict type hint could be more specific as dict[str, CheatsheetGroup] for better type safety and IDE support.

♻️ Proposed type hint improvement

-    bucket: dict = {}
+    bucket: dict[str, CheatsheetGroup] = {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
at line 318, The `bucket` variable in the cheatsheet_categorizer.py file is
using a generic `dict` type hint, which lacks specificity and reduces IDE type
safety. Replace the type hint from `dict` to `dict[str, CheatsheetGroup]` to
explicitly indicate that the bucket dictionary maps string keys to
CheatsheetGroup values. Ensure that CheatsheetGroup is properly imported or
available in the scope where this change is made.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in the CheatsheetRecord class
validates required string fields but omits validation for the required headings
field. Add validation after the existing string field checks to ensure the
headings field is present and is a non-empty list, raising a ValueError with an
appropriate message if it is missing, not a list, or empty, maintaining
consistency with the error message format used for string field validation.

---

Nitpick comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Line 341: The list construction at the assignment to parts uses the + operator
for list concatenation, which is less idiomatic than iterable unpacking in
Python. Replace the list concatenation using + operators with iterable unpacking
syntax by converting the assignment to use a single list literal with the
record.title element followed by *record.headings and *record.category_hints
unpacking operators to combine all the elements into the parts list.
- Line 318: The `bucket` variable in the cheatsheet_categorizer.py file is using
a generic `dict` type hint, which lacks specificity and reduces IDE type safety.
Replace the type hint from `dict` to `dict[str, CheatsheetGroup]` to explicitly
indicate that the bucket dictionary maps string keys to CheatsheetGroup values.
Ensure that CheatsheetGroup is properly imported or available in the scope where
this change is made.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 2c123407-b815-4a80-a0a0-d553c11b3de7

📥 Commits

Reviewing files that changed from the base of the PR and between 604e85a and 99be263.

📒 Files selected for processing (4)

application/tests/test_cheatsheet_categorizer.py
application/tests/web_main_test.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
application/web/web_main.py

🚧 Files skipped from review as they are similar to previous changes (3)

application/tests/web_main_test.py
application/web/web_main.py
application/tests/test_cheatsheet_categorizer.py

coderabbitai · 2026-06-14T19:03:17Z

This command requires write access to the repository. Only users with write or admin permissions can trigger CodeRabbit to commit or create pull requests.

Bornunique911 · 2026-06-14T19:30:29Z

@coderabbitai review

coderabbitai · 2026-06-14T19:30:35Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

♻️ Duplicate comments (1)

application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (1)
194-210: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing validation for required headings field.

This issue was already flagged in a previous review comment. The __post_init__ method validates required string fields but does not check the headings field. According to line 181, "Required fields must be non-empty strings / lists after normalisation," but headings (line 189) is a required field with no default that is not validated. An empty list or non-list value would violate the contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
around lines 194 - 210, The `__post_init__` method in the CheatsheetRecord class
validates required string fields but is missing validation for the `headings`
field, which is also a required field according to the class design. Add
validation logic after the existing string field validation loop to check that
`headings` is a non-empty list and raise a ValueError with a descriptive message
if it is not a list, is empty, or has no valid content, following the same
pattern and structure as the string field validation checks above it.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The `__post_init__` method in the CheatsheetRecord class
validates required string fields but is missing validation for the `headings`
field, which is also a required field according to the class design. Add
validation logic after the existing string field validation loop to check that
`headings` is a non-empty list and raise a ValueError with a descriptive message
if it is not a list, is empty, or has no valid content, following the same
pattern and structure as the string field validation checks above it.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: bdc441f9-0dab-4e63-9b95-9b14ab8ce2fb

📥 Commits

Reviewing files that changed from the base of the PR and between 99be263 and 86088d4.

📒 Files selected for processing (2)

application/tests/web_main_test.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

🚧 Files skipped from review as they are similar to previous changes (1)

application/tests/web_main_test.py

shreeshtripurwarcomp23-coder · 2026-06-14T20:18:10Z

@coderabbitai review

coderabbitai · 2026-06-14T20:18:17Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

import_telemetry.json (1)
1-25: 💤 Low value

Biome parse errors are false positives for NDJSON format.

The static analysis tool Biome (a JavaScript/TypeScript linter) is reporting parse errors because it expects a single JSON object, but this file uses NDJSON (newline-delimited JSON) format where each line is a separate JSON object. This is the correct format per the upstream emit_import_event implementation that writes json.dumps(event) + "\n".

If this file is meant to be committed, consider excluding it from Biome checks by adding it to the Biome ignore configuration, or document that these errors are expected for NDJSON files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@import_telemetry.json` around lines 1 - 25, The import_telemetry.json file
contains NDJSON format (newline-delimited JSON) where each line is a separate
valid JSON object, which matches the upstream emit_import_event implementation
that writes json.dumps(event) + "\n". Biome is incorrectly reporting parse
errors because it expects a single JSON object. Add the import_telemetry.json
file to the Biome ignore configuration in your biome.json config file by adding
the file path to the ignore patterns or files list, so Biome will skip linting
this file and these false positive errors will be resolved.
Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@import_telemetry.json`:
- Around line 1-25: The file import_telemetry.json contains transient runtime
telemetry data generated by the telemetry.py module during test execution and
should not be committed to version control. Delete import_telemetry.json from
the repository and add the filename to the .gitignore file to prevent accidental
commits of future telemetry artifacts generated during local development.

---

Nitpick comments:
In `@import_telemetry.json`:
- Around line 1-25: The import_telemetry.json file contains NDJSON format
(newline-delimited JSON) where each line is a separate valid JSON object, which
matches the upstream emit_import_event implementation that writes
json.dumps(event) + "\n". Biome is incorrectly reporting parse errors because it
expects a single JSON object. Add the import_telemetry.json file to the Biome
ignore configuration in your biome.json config file by adding the file path to
the ignore patterns or files list, so Biome will skip linting this file and
these false positive errors will be resolved.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 6897a458-7002-46e1-a2b9-1804bd249296

📥 Commits

Reviewing files that changed from the base of the PR and between 99be263 and 931dace.

📒 Files selected for processing (3)

application/tests/web_main_test.py
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
import_telemetry.json

🚧 Files skipped from review as they are similar to previous changes (1)

application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

shreeshtripurwarcomp23-coder added 2 commits May 27, 2026 03:50

fix(smartlink): redirect to CRE page directly when only one CRE is li…

fe2e5ce

…nked (issue OWASP#486)

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

fix(workstream-c): apply black formatting

d93afbf

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

shreeshtripurwarcomp23-coder and others added 24 commits June 14, 2026 23:59

fix(scripts): send User-Agent in GA health monitor HTTP requests

32c4864

Cloudflare blocks anonymous urllib requests to ga_standards on production. Co-authored-by: Cursor <cursoragent@cursor.com>

docs(agents): track AGENTS.md with production GA cache-only policy

5b56c4c

Allow AGENTS.md through the *.md gitignore exception and document that Heroku/opencreorg gap analysis is cache-only (no compute on production). Co-authored-by: Cursor <cursoragent@cursor.com>

fix(ga): block empty primary cache writes and clobbering

73ba870

Guard add_gap_analysis_result so non-material {"result":{}} primary rows are not inserted and cannot overwrite material cache; subresource keys unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Restore GA sync script with material-only merge upsert.

a63cc7b

Supports postgres-to-postgres sync via temp-table merge for prod tables without a unique index on cache_key. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(scripts): drop PROD_DATABASE_URL default from PCI linker

7b77093

Avoid accidental production writes when running link_pci_dss_cre.py without explicit --cache-file or CRE_CACHE_FILE. Co-authored-by: Cursor <cursoragent@cursor.com>

Fix IndexError in get_cre_by_db_id when get_CREs is empty

418f90c

Guard against an empty get_CREs result so callers get None instead of IndexError when a DB row exists but no matching CRE document is found. Co-authored-by: Cursor <cursoragent@cursor.com>

Address CodeRabbit review and fix black formatting

93a2e5e

Harden PCI env parsing, tighten sync script safety checks, make bridge fallback tests deterministic, and format files flagged by CI black. Co-authored-by: Cursor <cursoragent@cursor.com>

Add docstrings to satisfy CodeRabbit coverage on PR OWASP#918.

70aeb1e

Co-authored-by: Cursor <cursoragent@cursor.com>

Add shared Cursor agent instructions for OpenCRE.

932bf50

Track AGENTS.md and .cursor/rules so the team shares human-plan-then-agent-execute workflows, CI/PR policies, and domain safety guardrails. Co-authored-by: Cursor <cursoragent@cursor.com>

ci(codeql): upgrade CodeQL Action v1->v4 and checkout v2->v4

2b3062a

fix: correct expected tags value in test_dbNodeFromNode

d1b684b

The test expected tags="1,2" but dbNodeFromCode joins the input list ["111-111", "222-222"] with commas, producing "111-111,222-222". The expected value in the test was wrong.

ci: declare contents:read on test workflow

589a6da

Signed-off-by: Arpit Jain <arpitjain099@gmail.com>

Fix float parsing of ISO section codes like 5.10

ffcd311

Replace get_all_records() with get_all_values() to bypass gspread's numericise() which converts section codes like '5.10' to float 5.1. get_all_values() returns raw strings, preserving trailing zeros. Fixes OWASP#574 Fixes OWASP#546

test: mock get_all_values for read_spreadsheet ISO section IDs

e324215

Align spreadsheet_test with get_all_values-based read path so section codes like 5.10 stay strings instead of being float-coerced. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(spreadsheet): harden get_all_values row parsing edge cases

5477871

Handle empty worksheets and pad short rows so section IDs are preserved as strings without IndexError or truncated dict keys.

fix(spreadsheet): reject duplicate headers in get_all_values path

561cc2f

Fail fast with GSpreadException when worksheet header row contains duplicates, use zip(strict=True) after row padding, and add regression tests for the helper and read_spreadsheet integration.

northdpole and others added 3 commits June 14, 2026 23:59

fix(workstream-c): address CodeRabbit review comments

88a14db

shreeshtripurwarcomp23-coder force-pushed the workstream-c-categorization branch from 604e85a to 88a14db Compare June 14, 2026 18:29

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

style: format web_main_test.py with black

936f7eb

shreeshtripurwarcomp23-coder force-pushed the workstream-c-categorization branch from 99be263 to 936f7eb Compare June 14, 2026 18:53

Merge branch 'main' into workstream-c-categorization

634d4a8

style: apply minor cleanups for iterable unpacking and type hinting

86088d4

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

feat: implement cheatsheet headings validation and formatting fixes

ed3b038

shreeshtripurwarcomp23-coder force-pushed the workstream-c-categorization branch from 86088d4 to ed3b038 Compare June 14, 2026 19:41

Merge branch 'backup-work' into workstream-c-categorization

931dace

shreeshtripurwarcomp23-coder closed this Jun 14, 2026

shreeshtripurwarcomp23-coder reopened this Jun 14, 2026

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread import_telemetry.json

shreeshtripurwarcomp23-coder closed this Jun 15, 2026

Conversation

shreeshtripurwarcomp23-coder commented Jun 14, 2026

Workstream C — Categorization and Optional Grouping

What this PR delivers

Checklist

Acceptance criteria met

Note

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 14, 2026

Uh oh!

Bornunique911 commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

shreeshtripurwarcomp23-coder commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading