[BUG] Reports containing scanned content with ANSI/control bytes are emitted as binary (break Markdown rendering)

## Summary

When a scanned skill's content (or LLM output quoting it) contains ANSI escape
sequences or control bytes (e.g. `NUL` `\x00`, `ESC` `\x1b`), those bytes flow
verbatim into finding text and then into the generated report. The result is a
report file that tools treat as **binary**:

- **GitLab / GitHub / editors** detect the `.md` as binary and offer
  *"download"* instead of rendering it (`file report.md` → `data`, not text).
- **Terminal** output is garbled by stray escape sequences.

## Repro

Scan any skill whose content includes a terminal-colored snippet or a stray
control byte (real-world: build/log-analysis skills that embed ANSI codes), then
write a Markdown report:

```bash
skillspector scan ./skill/ --format markdown --output report.md
file report.md      # -> "data" (binary), should be UTF-8 text

The report won't render inline in GitLab/GitHub; you only get a download link.

Expected

Reports should always be clean UTF-8. Note this is distinct from #144 (skipping
binary input files) — here the problem is control bytes in the output report,
which can originate from any quoted snippet even in a text file.

Proposed fix

Strip ANSI escape sequences and disallowed control chars (keeping tab/newline and
multibyte UTF-8) from finding text in the report node, so every format
(terminal/JSON/Markdown/SARIF) stays clean. PR incoming.

## 2. PR is ready to open (branch pushed: `assinchu:feature/report-sanitizer`)

Once the issue has a number (say **#NNN**), I'll open the PR with this body:

```markdown
## Summary

Fixes #NNN. Scanned content (and LLM output quoting it) can carry ANSI escape
sequences and control bytes (NUL, ESC, ...) into finding text. Emitted verbatim,
these make a report register as **binary** — GitLab/editors offer "download"
instead of rendering the Markdown, and terminals print garbled output.

## Change

Sanitize every finding's free-text fields once in the **report node** (the single
scoring/formatting point), so terminal, JSON, Markdown, and SARIF output all stay
clean UTF-8. Tabs, newlines, and multibyte UTF-8 (e.g. emoji severity markers)
are preserved. Non-text fields and counts are untouched.

- `_clean_text` / `_sanitize_finding` + `_ANSI_RE` / `_CONTROL_RE` in `report.py`
- Applied to `filtered_findings` at the top of `report()` before scoring/format

## Tests

New `tests/nodes/test_report_sanitizer.py`: unit tests for `_clean_text` /
`_sanitize_finding`, plus a parametrized check that no `\x00`/`\x1b` leaks into
any of the four output formats while readable content survives. Full suite green;
`ruff check`/`format` clean.

This is distinct from #144 (binary *input* files); it sanitizes the *output*.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Reports containing scanned content with ANSI/control bytes are emitted as binary (break Markdown rendering) #186

Summary

Repro

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Reports containing scanned content with ANSI/control bytes are emitted as binary (break Markdown rendering) #186

Description

Summary

Repro

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions