Feature/evidence type filter#5
Conversation
Lets a user restrict which evidence types get validated by an execution instead of always processing all three. Adds a generic processing_config JSONB column (Alembic migration) on Execution so future per-execution processing toggles have a home; evidence_types rides inside it. Plumbed through to the pre-processor (skips disallowed-type rows, reported in summary) and the processor (defense-in-depth re-check in case a stale pre-split file is reused). Supported on both create paths and on PATCH update, where it merges into the existing processing_config instead of overwriting it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Excluded rows were tagged Relevance Tag='Irrelevant' even though no API call was made for them, which inflated api_failures/failed_list in the processor's run summary and skewed the relevance percentage shown in reports (report_service.py folds every 'Irrelevant' row into the same bucket regardless of whether it was actually evaluated). Reuse 'notValidated' instead (same sentinel the relevant-evidence-cap feature already uses for "not evaluated due to execution config") and port that branch's two report/processor fixes onto this one: a 4th notValidated bucket in RELEVANCE_TYPES/create_node/relevance_counts so the breakdown still sums to total, and a rel_score() denominator that excludes notValidated so the percentage reflects only evaluated rows. Comment wording matches the sibling branch verbatim so the two branches' overlapping hunks merge without conflict later. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pandas.read_excel() needs openpyxl to read .xlsx files; without it, process_excel() in scripts/processor/1-main-parallel-script.py throws "Missing optional dependency 'openpyxl'" on every excel row, retries 3x, then gives up and tags the row Irrelevant with no Q&A — silently looking like excel evidence was never analyzed. Sister repo already carries this dependency (unpinned); pin it here to match how this repo pins everything else. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| return None | ||
| if not isinstance(v, list): | ||
| raise ValueError("evidence_types must be an array of strings") | ||
| allowed = {"image", "pdf", "excel"} |
There was a problem hiding this comment.
@borkarsaish65 Can we take this from csv type config table ? there we can store available allowed ones.
also include the extensions as well
| return None | ||
| if not isinstance(v, list): | ||
| raise ValueError("evidence_types must be an array of strings") | ||
| allowed = {"image", "pdf", "excel"} |
There was a problem hiding this comment.
@borkarsaish65 dont hardcode the available evidence type in code
| states=request_data.states or [], | ||
| criterias_mode=criterias_mode, | ||
| threshold_config=threshold_config, | ||
| processing_config=processing_config, |
| detail="states must be a non-empty array if provided." | ||
| ) | ||
|
|
||
| if field == 'evidence_types': |
| # excluded by the execution's evidence-type filter. It's a distinct bucket: counted in | ||
| # "total" but never folded into Relevant/Partially Relevant/Irrelevant, and excluded from | ||
| # rel_score's numerator and denominator. | ||
| RELEVANCE_TYPES = {"Relevant", "Partially Relevant", "Irrelevant", "notValidated"} |
There was a problem hiding this comment.
@borkarsaish65 Can we move this as well in commons
No description provided.