Skip to content

fix: race condition between close() and _encode_sync in H264LiveEncoder#292

Open
zackzmai wants to merge 2 commits into
XiaoMi:mainfrom
zackzmai:fix/transcode-encode-race
Open

fix: race condition between close() and _encode_sync in H264LiveEncoder#292
zackzmai wants to merge 2 commits into
XiaoMi:mainfrom
zackzmai:fix/transcode-encode-race

Conversation

@zackzmai

Copy link
Copy Markdown
Contributor

Problem

H264LiveEncoder.close() sets self._codec = None on the main thread to signal concurrent _encode_sync calls to stop. However, _encode_sync may have already passed the assert self._codec is not None guard and be about to call self._codec.encode(frame) — at which point self._codec becomes None, causing:

'NoneType' object has no attribute 'encode'

This triggers whenever a WebSocket video-stream client disconnects while a frame encode is in-flight, and is visible in logs as repeated transcode encode error entries.

Fix

Capture self._codec into a local variable right after the assertion and use that local for the encode loop. close() can safely set self._codec = None without affecting an in-flight encode.

One-line semantic change, no API or behavioral difference.

close() sets self._codec = None on the main thread to signal concurrent
_encode_sync calls to stop. However, _encode_sync may have already passed
the 'assert self._codec is not None' check and be about to call
self._codec.encode(frame) — at which point self._codec becomes None,
causing 'NoneType has no attribute encode'.

Fix: capture self._codec into a local variable right after the assertion
and use that local for the encode loop. close() can safely set
self._codec = None without affecting an in-flight encode.
@github-actions

Copy link
Copy Markdown

👋 感谢提交 PR @zackzmai!维护者会尽快 review。

提交前请确认:

  • CI 全绿(test / lint / build)
  • 改动聚焦单一主题,便于审阅
  • 若改动了依赖(lockfile / pyproject.toml / package.json),需维护者评论 /allow-dependencies-change <当前 head SHA> 放行(之后再 push 需重新放行)

@yangbaofu007

Copy link
Copy Markdown
Collaborator

🟡 [Review] backend/miloco/src/miloco/miot/transcoder.py:134-137 — 局部捕获放在断言之后,竞态窗口缩小但没消除,同一个异常仍可复现

背景_encode_sync 跑在 h264-enc 编码线程,close() 在事件循环主线程上执行 self._codec = None(L179)。两者真并发——即 PR 描述里"客户端断开时某帧编码进行中"的场景。

问题:修复对 self._codec 做了两次独立读取——先断言、再捕获:

assert self._codec is not None   # L134 ← 第 1 次读 self._codec
# Capture codec ref locally ...
codec = self._codec              # L137 ← 第 2 次读 self._codec
...
for packet in codec.encode(frame):   # L151

CPython 可在任意两条字节码之间切线程。若主线程在 L134 和 L137 之间执行 self._codec = None,捕获到的 codec 就是 None,L151 抛出和修复前一模一样的 'NoneType' object has no attribute 'encode'

编码线程 (_encode_sync) 主线程 (close())
L134 assert self._codec is not None → 通过(codec 已设置)
L179 self._codec = None
L137 codec = self._codeccodec = None
L151 codec.encode(frame)'NoneType' object has no attribute 'encode'

窗口比修复前小(原窗口 L134→L151 含 from_ndarray/reformat/pts 赋值多步;现窗口只剩 L134→L137 一次属性 load),复现概率大幅下降但没归零,所以 "close() can safely set self._codec = None without affecting an in-flight encode" 目前兑现不了。

补充:Python 断言在 -O / PYTHONOPTIMIZE 下被整条剥除。若后端以优化模式运行,L134 断言不存在,唯一读取是 L137 捕获,捕获到的 codec 仍可能是 None——断言救不了这个局部捕获。

改进:只读一次 self._codec,把判空作用在捕获到的局部上。因为 close()None 是正常并发关闭、不是编程错误,用早返回比断言更贴语义:

    # Capture codec ref locally FIRST so a single read is checked and used —
    # close() setting self._codec = None on the main thread can't slip in
    # between the check and the encode.
    codec = self._codec
    if codec is None:
        return []

    frame = av.VideoFrame.from_ndarray(bgr, format="bgr24")
    frame = frame.reformat(format="yuv420p")
    ...
    out: list[tuple[bytes, bool]] = []
    for packet in codec.encode(frame):
        out.append((bytes(packet), bool(packet.is_keyframe)))
    return out

若想保留断言来捕捉"非 close 路径下 codec 不应为 None"的编程错误,也要让断言作用在局部上,确保只读一次:

    codec = self._codec
    assert codec is not None   # 校验即将使用的同一个局部,不再二次读 self._codec

@HCl8 HCl8 self-requested a review June 24, 2026 07:42
close() sets self._codec = None on the main thread to block concurrent
_encode_sync calls. The previous fix still read self._codec twice
(assert at L134 then capture at L137), so the main thread could set
None between the two reads and the captured local would be None,
reproducing 'NoneType' object has no attribute 'encode'. Under
PYTHONOPTIMIZE the assert is stripped entirely.

Per review (yangbaofu007): capture self._codec into a local in a single
read and guard with an early return. close() is a normal concurrent
shutdown, not a programming error, so early return fits the semantics
better than an assert. The whole encode loop now references only the
local, so close() setting self._codec = None can no longer affect an
in-flight encode.
@zackzmai

Copy link
Copy Markdown
Contributor Author

@yangbaofu007 采纳,已按建议改为单次读取 + 早返回(commit 40ca318):

codec = self._codec
if codec is None:
    return []

原 assert + 二次读取捕获的窗口已消除:整个 encode 循环只引用局部 codec,主线程 close()self._codec = None 不再影响在途 encode;PYTHONOPTIMIZE 下断言被剥除的问题也随之规避。close 是正常并发关闭,早返回比断言更贴语义。CI 全绿。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants