Fix WHEP playback: track retention, vanilla-ICE offer, RTP depacketization, playout pacing by mkulaczkowski · Pull Request #1919 · HaishinKit/HaishinKit.swift

mkulaczkowski · 2026-06-11T15:47:32Z

Summary

WHEP playback via HTTPSession currently cannot deliver a frame: connect() always throws, and even with negotiation fixed, the RTP→decode pipeline drops everything. This PR fixes the full chain, verified end-to-end against a live MediaMTX server (0 frames → sustained ~30 fps on macOS and iOS).

Fixes (in pipeline order)

Playback tracks were deleted before negotiation. RTCTrack.deinit calls rtcDeleteTrack, and connect(.playback) discarded the addTrack(...) results — libdatachannel then threw "No DataChannel or Track to negotiate" on setLocalDescription, so connect always failed. The tracks are now retained for the session lifetime.
rtcCreateOffer is invalid under the default auto-negotiation config (returns RTC_ERR_FAILURE; rtc.h marks it "for specific use cases only"). Replaced with the canonical flow: setLocalDescription → wait for ICE gathering to complete → read rtcGetLocalDescription. The offer then also contains the gathered candidates — required for this non-trickle client (no PATCH), otherwise the server may never reach it.
HTTP error responses were parsed as SDP. requestOffer ignored the status code, so e.g. MediaMTX's JSON error bodies went into setRemoteDescription ("Remote description has no ICE user fragment"). Non-2xx now throws with status + body.
Audio-incompatible streams rejected the whole offer. MediaMTX answers 400 "codecs not supported by client" when the offered audio m-line can't be satisfied (any stream without Opus). Playback now retries once with a video-only offer; Opus streams keep audio+video.
Basic auth. URL userinfo (https://user:pass@host/...) is converted into an Authorization: Basic header (URLSession does not transmit userinfo; MediaMTX authenticates WHIP publishing this way).
Single hardcoded H264 profile in the offer. 42e01f only — streams with different profile bytes (plain/constrained baseline from x264, main, high) were rejected. The offer now lists multiple profile variants like browsers do; the server's answer picks one and the depacketizer follows the negotiated description.
RTPJitterBuffer could never start, then wedged permanently. It expected the first RTP sequence to be 0 (it's random per RFC 3550 §5.1), and its advance-by-one stale handling let expectedSequence run past the live sequence after the first loss/reorder — after which nothing matched again. Now primes from the first packet, drops late packets wrap-aware, and jumps over gaps once the reorder window fills.
STAP-A (RFC 6184 §5.7.1) was unimplemented. MediaMTX/Pion delivers SPS/PPS aggregated in STAP-A, so the decoder never received parameter sets and no frame was ever emitted.
FU-A emitted per-fragment instead of per-access-unit. A completed FU-A is one NAL (one slice); multi-slice frames (e.g. x264 zerolatency) were handed to VideoToolbox as partial frames → kVTVideoDecoderBadDataErr on every frame. NAL units now accumulate into the access unit and emit on the RTP marker.
In-place AVCC conversion corrupted access units. A written 4-byte length in 256–511 is 00 00 01 xx, which toNALFileFormat's continuing reverse scan re-matched as a start code. The AVCC buffer is now built forward from the parsed NAL units.
Video-only playback froze on the first frame. MediaLink pacing preferred audioPlayer.currentTime, which is 0 forever for an attached-but-idle AudioPlayerNode; the audio clock is now used only while it advances.
Decode failures were silent (guard let imageBuffer else { return } discarded the status). They're now logged (throttled). Also: the macOS DisplayLink reported a zero frame interval with the default preferredFramesPerSecond = 0, so elapsed-time consumers never advanced; it falls back to the display refresh period.

Verification

macOS harness driving HTTPSessionFactory → .playback against MediaMTX (H.264 + Opus via RTSP ingest, and H.264 + AAC / video-only variants): SDP negotiation, ICE connect, sustained decoded frames; the audio-fallback matrix behaves as described.
Same patch set is running in a production iOS app (device-to-device through MediaMTX) — both WHEP playback and WHIP publishing (.publish mode uses the same negotiation flow).
swift build clean on this branch.

Happy to split this into smaller PRs if preferred.

…n, playout pacing Fixes a chain of issues that prevented HTTP (WHEP) playback from ever delivering a frame, verified end-to-end against MediaMTX (0 frames -> ~30fps): - HTTPSession: retain playback tracks. RTCTrack.deinit calls rtcDeleteTrack, so discarding the addTrack(...) result deleted the native tracks before negotiation ("No DataChannel or Track to negotiate" -> connect always threw). - HTTPSession: libdatachannel's rtcCreateOffer requires disableAutoNegotiation and returns RTC_ERR_FAILURE under the default config. Use the canonical flow: setLocalDescription, wait for ICE gathering to complete, then read rtcGetLocalDescription - the offer then also carries the gathered candidates, which a non-trickle client needs for the server to reach it. - HTTPSession: throw on non-2xx WHEP/WHIP responses instead of feeding error bodies into setRemoteDescription ("Remote description has no ICE user fragment"). - HTTPSession: retry playback with a video-only offer when the server rejects the audio m-line (e.g. MediaMTX "codecs not supported by client" for any stream without Opus audio). Opus streams keep audio+video. - HTTPSession: convert URL userinfo into an HTTP Basic Authorization header (URLSession does not transmit userinfo; MediaMTX authenticates WHIP this way). - RTCPeerConnection: offer multiple H264 profile variants (42e01f/42c01f/ 42001f/4d001f/64001f) like browsers do; a single hardcoded profile is rejected for streams whose profile bytes differ. Adds currentLocalDescription() wrapping rtcGetLocalDescription. - RTPJitterBuffer: prime the expected sequence from the first packet (RTP sequence numbers start at a random value per RFC 3550), drop late packets wrap-aware, and jump over gaps once the reorder window fills. The previous advance-by-one stale handling let expectedSequence run past the live sequence after the first loss, permanently stalling delivery. - RTPH264Packetizer: implement STAP-A (RFC 6184 5.7.1) - SPS/PPS commonly arrive aggregated; without it no frame ever decodes. Accumulate FU-A NAL units into full access units and emit on the RTP marker (multi-slice frames were emitted per-slice -> kVTVideoDecoderBadDataErr). Build the AVCC buffer forward from parsed units; the in-place start-code rewrite corrupted access units whose NAL lengths fall in 256-511 (a written length 00 00 01 xx re-matches as a start code). - MediaLink: use the audio clock for playout pacing only while it is advancing; an attached but silent AudioPlayerNode reports currentTime == 0 forever, pinning video at the first frame for video-only streams. - VTDecompressionSession: log decode failures (throttled) instead of silently dropping them. - DisplayLinkChoreographer (macOS): fall back to the display refresh period when frameInterval is 0 so elapsed-time consumers advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix WHEP playback: track retention, vanilla-ICE offer, RTP depacketization, playout pacing#1919

Fix WHEP playback: track retention, vanilla-ICE offer, RTP depacketization, playout pacing#1919
mkulaczkowski wants to merge 1 commit into
HaishinKit:mainfrom
mkulaczkowski:fix/whep-playback

mkulaczkowski commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

mkulaczkowski commented Jun 11, 2026

Summary

Fixes (in pipeline order)

Verification

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant