Skip to content

feat(workstream-c): cheatsheet categorization and grouping#932

Closed
shreeshtripurwarcomp23-coder wants to merge 35 commits into
OWASP:mainfrom
shreeshtripurwarcomp23-coder:workstream-c-categorization
Closed

feat(workstream-c): cheatsheet categorization and grouping#932
shreeshtripurwarcomp23-coder wants to merge 35 commits into
OWASP:mainfrom
shreeshtripurwarcomp23-coder:workstream-c-categorization

Conversation

@shreeshtripurwarcomp23-coder

Copy link
Copy Markdown
Contributor

Workstream C — Categorization and Optional Grouping

Closes Issue C from the RFC: Autonomous LLM Pipeline for OWASP Cheat Sheet to CRE Mapping

What this PR delivers

  • categorize_cheatsheet(record) — labels cheat sheets using a 29-label controlled taxonomy via deterministic keyword matching
  • group_cheatsheets(records) — groups cheat sheets by category with stable sha256-based group IDs
  • LLM-optional path with full fallback to deterministic mode on any failure
  • 50 tests covering all 5 acceptance criteria from RFC Issue C

Checklist

  • C1 — Taxonomy defined (29 controlled labels, single source of truth)
  • C2 — Deterministic rule-based categorizer
  • C3 — Grouping with stable IDs across repeated runs
  • C4 — LLM-optional path with safe fallback
  • C5 — 50 tests, all passing

Acceptance criteria met

  • Labels only from approved taxonomy
  • Same input → same output (deterministic)
  • Group IDs stable across runs
  • Unknown inputs → uncategorized, no crash
  • 3+ categories + unknown case covered in tests

Note

CheatsheetRecord is currently a local stub matching the RFC contract exactly. Will be replaced with Workstream B's import once their PR merges — one line change, no logic affected.

- Implement categorize_cheatsheet() with 29-label controlled taxonomy
- Implement group_cheatsheets() with stable sha256-based group IDs
- Deterministic keyword/rule baseline, no LLM dependency
- LLM-optional path with safe fallback on failure
- 50 tests covering all acceptance criteria from RFC Issue C

CheatsheetRecord uses local stub pending Workstream B merge.
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Summary by CodeRabbit

  • New Features

    • Added cheatsheet categorization and grouping system with intelligent keyword-based organization.
    • Improved smartlink navigation: direct redirect to single linked resource instead of intermediate page.
  • Tests

    • Added comprehensive test suite for cheatsheet categorization and grouping functionality.
  • Chores

    • Enhanced telemetry data with import run records and event tracking.

Walkthrough

This PR adds a new cheatsheet_categorizer module with a controlled taxonomy, deterministic keyword-to-label matching, optional LLM-based labeling with safe fallback, and stable grouping via CheatsheetGroup. A corresponding test suite covers all code paths. Separately, the smartlink route gains a fast-path redirect to /cre/<id> when a node has exactly one CRE link.

Changes

Cheatsheet Categorizer Module

Layer / File(s) Summary
Taxonomy, keyword rules, and data contracts
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
Defines module context, TAXONOMY and UNCATEGORIZED sentinel, ordered keyword-to-label rules, CheatsheetRecord dataclass with __post_init__ validation, and CheatsheetGroup dataclass with deterministic make_group_id via truncated SHA-256.
categorize_cheatsheet, group_cheatsheets, and internal helpers
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
Implements categorize_cheatsheet with optional LLM invocation, label validation, and deterministic fallback; group_cheatsheets bucketing by group_id and sorting; _build_searchable_text, _deterministic_categorize keyword scan, and _validate_labels filtering.
Cheatsheet categorizer test suite
application/tests/test_cheatsheet_categorizer.py
Full unit test suite covering taxonomy integrity, deterministic categorization with sorting/deduplication, UNCATEGORIZED fallback, all LLM-path variants with injection and fallback scenarios, group_cheatsheets behavior and ordering, make_group_id determinism and format, _validate_labels filtering, and _deterministic_categorize expectations.

Smartlink Single-CRE Redirect

Layer / File(s) Summary
Single-CRE fast-path in smartlink route and test update
application/web/web_main.py, application/tests/web_main_test.py
Inserts a redirect to /cre/<cre_id> in the smartlink handler when the resolved node has exactly one CRE link; updates test_smartlink to assert 302 redirects to /cre/ pages for single-CRE cases and adds a second linked CRE fixture for the multi-CRE path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: implementing cheatsheet categorization and grouping functionality as part of Workstream C.
Description check ✅ Passed The description thoroughly explains the PR's purpose, deliverables, acceptance criteria, and implementation details related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.4.16)
import_telemetry.json

File contains syntax errors that prevent linting: Line 2: End of file expected; Line 3: End of file expected; Line 4: End of file expected; Line 5: End of file expected; Line 6: End of file expected; Line 7: End of file expected; Line 8: End of file expected; Line 9: End of file expected; Line 10: End of file expected; Line 11: End of file expected; Line 12: End of file expected; Line 13: End of file expected; Line 14: End of file expected; Line 15: End of file expected; Line 16: End of file expected; Line 17: End of file expected; Line 18: End of file expected; Line 19: End of file expected; Line 20: End of file expected; Line 21: End of file expected; Line 22: End of file expected; Line 23: End of file expected; Line 24: End of file expected; Line 25: End of file expected


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@application/tests/test_cheatsheet_categorizer.py`:
- Around line 330-344: Both test_same_category_same_group and
test_different_categories_different_groups contain assertions guarded by
conditional statements, allowing them to pass without testing anything if the
categorization unexpectedly changes. Fix this by converting the conditional
guards into direct assertions that verify the expected label relationships
first. In test_same_category_same_group, change the if statement to
self.assertEqual(auth_labels, pwd_labels) so the label equality is asserted
unconditionally, then follow with the grouping assertions. Similarly, in
test_different_categories_different_groups, change the if statement to
self.assertNotEqual(auth_labels, secrets_labels) to unconditionally assert the
labels differ, then follow with the grouping assertions. This ensures these
tests fail if categorization behavior changes unexpectedly.

In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 343-349: The loop variable `l` (lowercase letter L) on lines 343
and 346 violates Ruff E741 linting rules because it is ambiguous and looks like
the number 1. Rename this variable to a more descriptive name like `label`
throughout the code block. Replace `l` in the list comprehension (valid = [l for
l in labels...]), in the for loop declaration (for l in valid:), and in the
conditional check (if l not in seen:) and subsequent operations
(deduped.append(l) and seen.add(l)) with the new variable name to satisfy lint
requirements.
- Around line 170-189: The CheatsheetRecord dataclass documents that required
fields (source, source_id, title, hyperlink, summary, headings,
raw_markdown_path) must be non-empty strings or lists after normalization, but
this constraint is not enforced. Add a __post_init__ method to the
CheatsheetRecord dataclass that validates each required field is non-empty
(non-empty string or non-empty list), raising a descriptive ValueError with
field-level details if any required field is empty or invalid.
- Around line 209-211: The make_group_id function does not properly implement
set-based behavior because it sorts the labels list directly without removing
duplicates. When duplicate labels exist, different label sets produce different
hashes, violating the documented set-based contract. Fix this by converting the
labels parameter to a set before sorting it, so that duplicate labels are
eliminated and semantically equivalent label sets always produce the same stable
hash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 39b63327-3bdc-452c-bd18-66396c5b7cb8

📥 Commits

Reviewing files that changed from the base of the PR and between e853cd3 and 43cd7e5.

📒 Files selected for processing (4)
  • application/tests/test_cheatsheet_categorizer.py
  • application/tests/web_main_test.py
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
  • application/web/web_main.py

Comment thread application/tests/test_cheatsheet_categorizer.py Outdated
Comment thread application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (1)

194-210: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate list fields in __post_init__ to prevent categorize_cheatsheet runtime crashes.

CheatsheetRecord.__post_init__ currently guards only required string fields. If headings/category_hints contain non-strings, Line 342 (" ".join(parts)) can raise TypeError, which breaks the “unknown input should not crash” behavior.

Proposed fix
 def __post_init__(self) -> None:
@@
         for fname in required_str_fields:
             value = getattr(self, fname)
             if not isinstance(value, str) or not value.strip():
                 raise ValueError(
                     f"CheatsheetRecord.{fname} must be a non-empty string, "
                     f"got {value!r}"
                 )
+
+        required_list_fields = ["headings", "category_hints"]
+        for fname in required_list_fields:
+            value = getattr(self, fname)
+            if not isinstance(value, list):
+                raise ValueError(f"CheatsheetRecord.{fname} must be a list, got {type(value).__name__}")
+            if any(not isinstance(item, str) for item in value):
+                raise ValueError(f"CheatsheetRecord.{fname} must contain only strings")

Also applies to: 339-342

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
around lines 194 - 210, The __post_init__ method in CheatsheetRecord currently
validates only string fields but does not validate the list fields headings and
category_hints. If these list fields contain non-string values, the "
".join(parts) call in categorize_cheatsheet will raise a TypeError at runtime.
Add validation in __post_init__ to ensure that headings and category_hints are
present and contain only string elements, raising a ValueError with a
descriptive message if validation fails. This will prevent runtime crashes when
non-string values are passed in these list fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in CheatsheetRecord currently
validates only string fields but does not validate the list fields headings and
category_hints. If these list fields contain non-string values, the "
".join(parts) call in categorize_cheatsheet will raise a TypeError at runtime.
Add validation in __post_init__ to ensure that headings and category_hints are
present and contain only string elements, raising a ValueError with a
descriptive message if validation fails. This will prevent runtime crashes when
non-string values are passed in these list fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 0859c5cd-3886-42e4-96f8-08c6c3dfe810

📥 Commits

Reviewing files that changed from the base of the PR and between 484afac and 604e85a.

📒 Files selected for processing (2)
  • application/tests/test_cheatsheet_categorizer.py
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • application/tests/test_cheatsheet_categorizer.py

…#901) (OWASP#909)

The issue was in CommonRequirementEnumeration.tsx — the collapse logic (slicing the list after 5 items and showing a "Show more" button) was being applied to all link types including CREs, with no distinction between CRE links and links to external standards.

The fix adds an allLinksAreCres check that mirrors the same pattern already correctly implemented in DocumentNode.tsx. When all links in a group are of type CRE, the full list is shown without slicing and the "Show more" button is hidden. Links to external standards continue to collapse as before.

Changes are limited to a single file: CommonRequirementEnumeration.tsx

Added DOCUMENT_TYPES import from ../../const
Compute allLinksAreCres before rendering each link group
Use visibleResults (full list for CREs, sliced for others) instead of inline slice
Guard the "Show more" button with !allLinksAreCres
* feat: implement structured extraction checkpoints B1 and B2

Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com>

* docs: improve formatting

Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com>

* fix: improve normalization of required string fields

Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com>

* docs: add docstrings

Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com>

* fix: validate normalized string field values correctly

Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com>

---------

Signed-off-by: Abhijeet Saharan <abhijeetsaharan2236@gmail.com>
Signed-off-by: Abhijeet <abhijeetsaharan2236@gmail.com>
…WASP#823)

* Add curated CWE fallback mappings

* Cover CWE fallback and inheritance behavior with tests

* Add local CWE refresh tooling

* Add local helper scripts for issue OWASP#472

* Integrate OpenCRE map analysis support from issue OWASP#469

* Implement fallback for gap analysis in database with error handling

* Update scripts/show-db-stats.sh

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com>

* fix: remove leading space in 'xss' keyword for CWE mapping

* fix: update condition for related CWE entries to check for 'ChildOf' nature

* fix: correct syntax for accessing related CWE entry attributes

* fix: enhance gap analysis error handling for Heroku and fallback scenarios

---------

Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…nitor

PR OWASP#823 reintroduced Neo4j/Redis fallback on Heroku cache misses, causing 503s
when Neo4j DNS fails. Serve precomputed GA from Postgres only on Heroku and
return 404 on cache miss. Add monitor_ga_health.py for production regression
alerting (especially HTTP 503).

Fixes OWASP#923

Co-authored-by: Cursor <cursoragent@cursor.com>
Cloudflare blocks anonymous urllib requests to ga_standards on production.

Co-authored-by: Cursor <cursoragent@cursor.com>
Allow AGENTS.md through the *.md gitignore exception and document that
Heroku/opencreorg gap analysis is cache-only (no compute on production).

Co-authored-by: Cursor <cursoragent@cursor.com>
Guard add_gap_analysis_result so non-material {"result":{}} primary rows
are not inserted and cannot overwrite material cache; subresource keys unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Supports postgres-to-postgres sync via temp-table merge for prod tables
without a unique index on cache_key.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document operational scripts and weekly prod GA checks in AGENTS.md; add
link_pci_dss_cre.py for embedding-based CRE linking. Harden primary GA cache
key detection, sync script materiality guards, monitor 503 test, and DSN redaction.

Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid accidental production writes when running link_pci_dss_cre.py
without explicit --cache-file or CRE_CACHE_FILE.

Co-authored-by: Cursor <cursoragent@cursor.com>
Guard against an empty get_CREs result so callers get None instead of
IndexError when a DB row exists but no matching CRE document is found.

Co-authored-by: Cursor <cursoragent@cursor.com>
Serve precomputed OpenCRE GA from cache on Heroku instead of computing on
the web dyno, expand backfill to include automatic CRE links, and harden
PCI DSS / Secure Headers imports with better linking and parser fixes.

Co-authored-by: Cursor <cursoragent@cursor.com>
Harden PCI env parsing, tighten sync script safety checks, make bridge
fallback tests deterministic, and format files flagged by CI black.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Track AGENTS.md and .cursor/rules so the team shares human-plan-then-agent-execute workflows, CI/PR policies, and domain safety guardrails.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add modular .cursor/rules for requirements gates, tickets, TDD, and
verification; tighten plan-first and multi-agent flows; slim AGENTS.md
to an index aligned with make lint/mypy/test checks.

Co-authored-by: Cursor <cursoragent@cursor.com>
The test expected tags="1,2" but dbNodeFromCode joins the input list
["111-111", "222-222"] with commas, producing "111-111,222-222".
The expected value in the test was wrong.
Signed-off-by: Arpit Jain <arpitjain099@gmail.com>
Closes OWASP#862

request.args.get('text') returns None if the query param is absent.
Passing None into db.text_search() causes re.search() to raise
TypeError: expected string or bytes-like object.
Return a 400 before reaching the database call.
Replace get_all_records() with get_all_values() to bypass gspread's
numericise() which converts section codes like '5.10' to float 5.1.
get_all_values() returns raw strings, preserving trailing zeros.

Fixes OWASP#574
Fixes OWASP#546
Align spreadsheet_test with get_all_values-based read path so section
codes like 5.10 stay strings instead of being float-coerced.

Co-authored-by: Cursor <cursoragent@cursor.com>
Handle empty worksheets and pad short rows so section IDs are preserved
as strings without IndexError or truncated dict keys.
Fail fast with GSpreadException when worksheet header row contains
duplicates, use zip(strict=True) after row padding, and add regression
tests for the helper and read_spreadsheet integration.
northdpole and others added 3 commits June 14, 2026 23:59
Fix production H12 timeouts from GET /all_cres?per_page=1000 by batching
N+1 link hydration in the DB layer, capping per_page at 100, scoping
DataProvider to Explorer routes with incremental page loads, and using
ensureFullExplorerData for graph views.

Closes OWASP#930. Related: OWASP#847, OWASP#848.

Co-authored-by: Cursor <cursoragent@cursor.com>
Fix buildTree sibling keyPath mutation, serialize loadPage via promise
chain with exposed dataLoadError, hoist Explorer layout wrappers,
surface load failures in graph views, restore viewport zoom, and
harden pagination link parity test.

Co-authored-by: Cursor <cursoragent@cursor.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (2)

341-341: ⚡ Quick win

Prefer iterable unpacking for list construction.

As flagged by Ruff RUF005, iterable unpacking is more idiomatic than list concatenation in Python.

♻️ Proposed refactor
-    parts = [record.title] + record.headings + record.category_hints
+    parts = [record.title, *record.headings, *record.category_hints]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
at line 341, The list construction at the assignment to parts uses the +
operator for list concatenation, which is less idiomatic than iterable unpacking
in Python. Replace the list concatenation using + operators with iterable
unpacking syntax by converting the assignment to use a single list literal with
the record.title element followed by *record.headings and *record.category_hints
unpacking operators to combine all the elements into the parts list.

Source: Linters/SAST tools


318-318: 💤 Low value

Consider adding specific type hint for bucket.

The generic dict type hint could be more specific as dict[str, CheatsheetGroup] for better type safety and IDE support.

♻️ Proposed type hint improvement
-    bucket: dict = {}
+    bucket: dict[str, CheatsheetGroup] = {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
at line 318, The `bucket` variable in the cheatsheet_categorizer.py file is
using a generic `dict` type hint, which lacks specificity and reduces IDE type
safety. Replace the type hint from `dict` to `dict[str, CheatsheetGroup]` to
explicitly indicate that the bucket dictionary maps string keys to
CheatsheetGroup values. Ensure that CheatsheetGroup is properly imported or
available in the scope where this change is made.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in the CheatsheetRecord class
validates required string fields but omits validation for the required headings
field. Add validation after the existing string field checks to ensure the
headings field is present and is a non-empty list, raising a ValueError with an
appropriate message if it is missing, not a list, or empty, maintaining
consistency with the error message format used for string field validation.

---

Nitpick comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Line 341: The list construction at the assignment to parts uses the + operator
for list concatenation, which is less idiomatic than iterable unpacking in
Python. Replace the list concatenation using + operators with iterable unpacking
syntax by converting the assignment to use a single list literal with the
record.title element followed by *record.headings and *record.category_hints
unpacking operators to combine all the elements into the parts list.
- Line 318: The `bucket` variable in the cheatsheet_categorizer.py file is using
a generic `dict` type hint, which lacks specificity and reduces IDE type safety.
Replace the type hint from `dict` to `dict[str, CheatsheetGroup]` to explicitly
indicate that the bucket dictionary maps string keys to CheatsheetGroup values.
Ensure that CheatsheetGroup is properly imported or available in the scope where
this change is made.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 2c123407-b815-4a80-a0a0-d553c11b3de7

📥 Commits

Reviewing files that changed from the base of the PR and between 604e85a and 99be263.

📒 Files selected for processing (4)
  • application/tests/test_cheatsheet_categorizer.py
  • application/tests/web_main_test.py
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
  • application/web/web_main.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • application/tests/web_main_test.py
  • application/web/web_main.py
  • application/tests/test_cheatsheet_categorizer.py

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

This command requires write access to the repository. Only users with write or admin permissions can trigger CodeRabbit to commit or create pull requests.

@Bornunique911

Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py (1)

194-210: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing validation for required headings field.

This issue was already flagged in a previous review comment. The __post_init__ method validates required string fields but does not check the headings field. According to line 181, "Required fields must be non-empty strings / lists after normalisation," but headings (line 189) is a required field with no default that is not validated. An empty list or non-list value would violate the contract.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`
around lines 194 - 210, The `__post_init__` method in the CheatsheetRecord class
validates required string fields but is missing validation for the `headings`
field, which is also a required field according to the class design. Add
validation logic after the existing string field validation loop to check that
`headings` is a non-empty list and raise a ValueError with a descriptive message
if it is not a list, is empty, or has no valid content, following the same
pattern and structure as the string field validation checks above it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The `__post_init__` method in the CheatsheetRecord class
validates required string fields but is missing validation for the `headings`
field, which is also a required field according to the class design. Add
validation logic after the existing string field validation loop to check that
`headings` is a non-empty list and raise a ValueError with a descriptive message
if it is not a list, is empty, or has no valid content, following the same
pattern and structure as the string field validation checks above it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: bdc441f9-0dab-4e63-9b95-9b14ab8ce2fb

📥 Commits

Reviewing files that changed from the base of the PR and between 99be263 and 86088d4.

📒 Files selected for processing (2)
  • application/tests/web_main_test.py
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • application/tests/web_main_test.py

@shreeshtripurwarcomp23-coder

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
import_telemetry.json (1)

1-25: 💤 Low value

Biome parse errors are false positives for NDJSON format.

The static analysis tool Biome (a JavaScript/TypeScript linter) is reporting parse errors because it expects a single JSON object, but this file uses NDJSON (newline-delimited JSON) format where each line is a separate JSON object. This is the correct format per the upstream emit_import_event implementation that writes json.dumps(event) + "\n".

If this file is meant to be committed, consider excluding it from Biome checks by adding it to the Biome ignore configuration, or document that these errors are expected for NDJSON files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@import_telemetry.json` around lines 1 - 25, The import_telemetry.json file
contains NDJSON format (newline-delimited JSON) where each line is a separate
valid JSON object, which matches the upstream emit_import_event implementation
that writes json.dumps(event) + "\n". Biome is incorrectly reporting parse
errors because it expects a single JSON object. Add the import_telemetry.json
file to the Biome ignore configuration in your biome.json config file by adding
the file path to the ignore patterns or files list, so Biome will skip linting
this file and these false positive errors will be resolved.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@import_telemetry.json`:
- Around line 1-25: The file import_telemetry.json contains transient runtime
telemetry data generated by the telemetry.py module during test execution and
should not be committed to version control. Delete import_telemetry.json from
the repository and add the filename to the .gitignore file to prevent accidental
commits of future telemetry artifacts generated during local development.

---

Nitpick comments:
In `@import_telemetry.json`:
- Around line 1-25: The import_telemetry.json file contains NDJSON format
(newline-delimited JSON) where each line is a separate valid JSON object, which
matches the upstream emit_import_event implementation that writes
json.dumps(event) + "\n". Biome is incorrectly reporting parse errors because it
expects a single JSON object. Add the import_telemetry.json file to the Biome
ignore configuration in your biome.json config file by adding the file path to
the ignore patterns or files list, so Biome will skip linting this file and
these false positive errors will be resolved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 6897a458-7002-46e1-a2b9-1804bd249296

📥 Commits

Reviewing files that changed from the base of the PR and between 99be263 and 931dace.

📒 Files selected for processing (3)
  • application/tests/web_main_test.py
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
  • import_telemetry.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py

Comment thread import_telemetry.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants