Skip to content

Fix UnicodeEncodeError on Windows (cp1252 default)#8

Open
cnrdfrncs-droid wants to merge 1 commit into
bradautomates:mainfrom
cnrdfrncs-droid:fix/windows-utf8-stdout
Open

Fix UnicodeEncodeError on Windows (cp1252 default)#8
cnrdfrncs-droid wants to merge 1 commit into
bradautomates:mainfrom
cnrdfrncs-droid:fix/windows-utf8-stdout

Conversation

@cnrdfrncs-droid

Copy link
Copy Markdown

Summary

scripts/watch.py crashes on Windows when it prints common Unicode characters (the arrow U+2192, em-dashes, ellipses) to stdout/stderr. Both streams default to cp1252 on Windows, which doesn't include those code points.

Repro

On a fresh Windows install with Python 3.13:

python scripts/watch.py "https://www.youtube.com/watch?v=jNQXAC9IVRw" --start 0 --end 10 --no-whisper

Crashes at the focus-range header line:

UnicodeEncodeError: 'charmap' codec can't encode character '→' in position 25: character maps to <undefined>
  File "scripts\watch.py", line 158, in main
    print(f"- **Focus range:** {format_time(...)} → {format_time(...)} ...")

The pipeline itself (download, captions, frame extraction) all succeeds — only the final markdown report fails to render.

Fix

Force UTF-8 on sys.stdout / sys.stderr at script startup via .reconfigure(encoding="utf-8"). The hasattr guard makes it a no-op on streams without that method (e.g. when stdout is replaced by something non-standard). On macOS/Linux this is also a no-op since both streams are already UTF-8.

Alternatives considered:

  • Setting PYTHONUTF8=1 env var — works but pushes the burden onto every user. Not discoverable.
  • Replacing every Unicode char in the source (->, em-dash → --) — invasive, easy to miss occurrences, and whisper.py / other scripts have the same chars.
  • Wrapping sys.stdout with a TextIOWrapper — equivalent but more code.

reconfigure is the smallest, most contained fix.

Test plan

  • Verified on Windows 11 / Python 3.13 / cp1252 console — python scripts/watch.py <url> --start 0 --end 10 --no-whisper runs to completion without PYTHONUTF8 set, prints the character cleanly in the focus-range header and final report.
  • No behavior change on macOS/Linux (streams already UTF-8).

watch.py prints common Unicode chars (arrow U+2192, em-dash) directly to
stdout/stderr. On Windows, both streams default to cp1252, which crashes
with UnicodeEncodeError when the script generates its final report or
focus-range header.

Repro on Windows:
    python scripts/watch.py <url> --start 0 --end 10
=> UnicodeEncodeError: 'charmap' codec can't encode character '→'

Force UTF-8 on stdout/stderr at startup. No-op on macOS/Linux (already
UTF-8) and on streams without .reconfigure() (the hasattr guard).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant