Skip to content

Fix UnicodeEncodeError on Windows when video titles contain emoji#34

Open
drlee91 wants to merge 1 commit into
bradautomates:mainfrom
drlee91:fix/windows-utf8-stdout
Open

Fix UnicodeEncodeError on Windows when video titles contain emoji#34
drlee91 wants to merge 1 commit into
bradautomates:mainfrom
drlee91:fix/windows-utf8-stdout

Conversation

@drlee91

@drlee91 drlee91 commented Jun 9, 2026

Copy link
Copy Markdown

Problem

On Windows, Python's stdout/stderr default to the locale code page (typically cp1252). When a video title contains emoji or any character outside that code page, printing the report raises UnicodeEncodeError and watch.py exits 1 — after the download, frame extraction, and transcription have already succeeded. The frame paths are never printed, so the whole run is lost.

Emoji in titles are common on YouTube, so on a default Windows setup this breaks a large share of videos.

Repro (Windows, e.g. German locale / cp1252)

python scripts/watch.py https://www.youtube.com/watch?v=pl3n9o_ZR9M
File "...\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f92f' in position 41: character maps to <undefined>

(The title is Anfängerin überrascht ALLE! 🤯 Angeln lernen von A bis Z 🎣 — the 🤯 kills the report.)

Fix

Reconfigure sys.stdout/sys.stderr to UTF-8 with errors="replace" at startup of the entry point. This is a no-op on macOS/Linux (streams are UTF-8 already) and is guarded with try/except for exotic streams that don't support reconfigure().

Since watch.py imports all other modules into the same process, this covers the whole pipeline.

Verification

On Windows 11 (cp1252 console):

  • Before: the repro above crashes with exit 1.
  • After: the same command prints the full report including the emoji title, exit 0.
  • python -c "print('🤯')" still crashes in the same shell, confirming the environment itself reproduces the bug and the fix is what resolves it.

Complements #4, which fixed the same class of issue for config file I/O.

… Windows

On Windows, Python's stdout/stderr default to the locale code page
(typically cp1252). When a video title contains emoji or any other
non-Latin-1 character, printing the report raises UnicodeEncodeError
and watch.py exits 1 after all the work (download, frames, transcript)
has already succeeded - the frame paths are never printed.

Reconfigure both streams to UTF-8 with errors='replace' at startup.
No-op on macOS/Linux where the streams are UTF-8 already; guarded for
exotic streams that don't support reconfigure().

Repro (any video with emoji in the title, e.g. on a German locale):
  python scripts/watch.py https://www.youtube.com/watch?v=pl3n9o_ZR9M
  -> UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f92f'

Complements bradautomates#4, which fixed the same class of issue for config files.
@RomeTheDev-AT

Copy link
Copy Markdown

Independent confirmation on a separate machine — Windows 11, German locale
(cp1252), Python 3.12.10. The fix is correct and resolves the crash for me.

One addition worth noting: it covers a second trigger beyond emoji titles.
Focused mode (--start/--end) prints a (U+2192) in the "Focus range"
line of the report, which also isn't in cp1252 and crashes identically before
this patch:

python scripts/watch.py video.mp4 --start 0:30 --end 0:36
...
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192'
in position 25: character maps to <undefined>

After reconfigure(encoding="utf-8", errors="replace") the full report prints
and exits 0 — for both emoji titles and the range line. So the fix
generalizes to every non-cp1252 glyph in the report, not just titles. +1 to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants