Skip to content

Fix WHEP playback: track retention, vanilla-ICE offer, RTP depacketization, playout pacing#1919

Open
mkulaczkowski wants to merge 1 commit into
HaishinKit:mainfrom
mkulaczkowski:fix/whep-playback
Open

Fix WHEP playback: track retention, vanilla-ICE offer, RTP depacketization, playout pacing#1919
mkulaczkowski wants to merge 1 commit into
HaishinKit:mainfrom
mkulaczkowski:fix/whep-playback

Conversation

@mkulaczkowski

Copy link
Copy Markdown

Summary

WHEP playback via HTTPSession currently cannot deliver a frame: connect() always throws, and even with negotiation fixed, the RTP→decode pipeline drops everything. This PR fixes the full chain, verified end-to-end against a live MediaMTX server (0 frames → sustained ~30 fps on macOS and iOS).

Fixes (in pipeline order)

  1. Playback tracks were deleted before negotiation. RTCTrack.deinit calls rtcDeleteTrack, and connect(.playback) discarded the addTrack(...) results — libdatachannel then threw "No DataChannel or Track to negotiate" on setLocalDescription, so connect always failed. The tracks are now retained for the session lifetime.
  2. rtcCreateOffer is invalid under the default auto-negotiation config (returns RTC_ERR_FAILURE; rtc.h marks it "for specific use cases only"). Replaced with the canonical flow: setLocalDescription → wait for ICE gathering to complete → read rtcGetLocalDescription. The offer then also contains the gathered candidates — required for this non-trickle client (no PATCH), otherwise the server may never reach it.
  3. HTTP error responses were parsed as SDP. requestOffer ignored the status code, so e.g. MediaMTX's JSON error bodies went into setRemoteDescription ("Remote description has no ICE user fragment"). Non-2xx now throws with status + body.
  4. Audio-incompatible streams rejected the whole offer. MediaMTX answers 400 "codecs not supported by client" when the offered audio m-line can't be satisfied (any stream without Opus). Playback now retries once with a video-only offer; Opus streams keep audio+video.
  5. Basic auth. URL userinfo (https://user:pass@host/...) is converted into an Authorization: Basic header (URLSession does not transmit userinfo; MediaMTX authenticates WHIP publishing this way).
  6. Single hardcoded H264 profile in the offer. 42e01f only — streams with different profile bytes (plain/constrained baseline from x264, main, high) were rejected. The offer now lists multiple profile variants like browsers do; the server's answer picks one and the depacketizer follows the negotiated description.
  7. RTPJitterBuffer could never start, then wedged permanently. It expected the first RTP sequence to be 0 (it's random per RFC 3550 §5.1), and its advance-by-one stale handling let expectedSequence run past the live sequence after the first loss/reorder — after which nothing matched again. Now primes from the first packet, drops late packets wrap-aware, and jumps over gaps once the reorder window fills.
  8. STAP-A (RFC 6184 §5.7.1) was unimplemented. MediaMTX/Pion delivers SPS/PPS aggregated in STAP-A, so the decoder never received parameter sets and no frame was ever emitted.
  9. FU-A emitted per-fragment instead of per-access-unit. A completed FU-A is one NAL (one slice); multi-slice frames (e.g. x264 zerolatency) were handed to VideoToolbox as partial frames → kVTVideoDecoderBadDataErr on every frame. NAL units now accumulate into the access unit and emit on the RTP marker.
  10. In-place AVCC conversion corrupted access units. A written 4-byte length in 256–511 is 00 00 01 xx, which toNALFileFormat's continuing reverse scan re-matched as a start code. The AVCC buffer is now built forward from the parsed NAL units.
  11. Video-only playback froze on the first frame. MediaLink pacing preferred audioPlayer.currentTime, which is 0 forever for an attached-but-idle AudioPlayerNode; the audio clock is now used only while it advances.
  12. Decode failures were silent (guard let imageBuffer else { return } discarded the status). They're now logged (throttled). Also: the macOS DisplayLink reported a zero frame interval with the default preferredFramesPerSecond = 0, so elapsed-time consumers never advanced; it falls back to the display refresh period.

Verification

  • macOS harness driving HTTPSessionFactory.playback against MediaMTX (H.264 + Opus via RTSP ingest, and H.264 + AAC / video-only variants): SDP negotiation, ICE connect, sustained decoded frames; the audio-fallback matrix behaves as described.
  • Same patch set is running in a production iOS app (device-to-device through MediaMTX) — both WHEP playback and WHIP publishing (.publish mode uses the same negotiation flow).
  • swift build clean on this branch.

Happy to split this into smaller PRs if preferred.

…n, playout pacing

Fixes a chain of issues that prevented HTTP (WHEP) playback from ever
delivering a frame, verified end-to-end against MediaMTX (0 frames -> ~30fps):

- HTTPSession: retain playback tracks. RTCTrack.deinit calls rtcDeleteTrack,
  so discarding the addTrack(...) result deleted the native tracks before
  negotiation ("No DataChannel or Track to negotiate" -> connect always threw).
- HTTPSession: libdatachannel's rtcCreateOffer requires disableAutoNegotiation
  and returns RTC_ERR_FAILURE under the default config. Use the canonical flow:
  setLocalDescription, wait for ICE gathering to complete, then read
  rtcGetLocalDescription - the offer then also carries the gathered candidates,
  which a non-trickle client needs for the server to reach it.
- HTTPSession: throw on non-2xx WHEP/WHIP responses instead of feeding error
  bodies into setRemoteDescription ("Remote description has no ICE user
  fragment").
- HTTPSession: retry playback with a video-only offer when the server rejects
  the audio m-line (e.g. MediaMTX "codecs not supported by client" for any
  stream without Opus audio). Opus streams keep audio+video.
- HTTPSession: convert URL userinfo into an HTTP Basic Authorization header
  (URLSession does not transmit userinfo; MediaMTX authenticates WHIP this way).
- RTCPeerConnection: offer multiple H264 profile variants (42e01f/42c01f/
  42001f/4d001f/64001f) like browsers do; a single hardcoded profile is
  rejected for streams whose profile bytes differ. Adds
  currentLocalDescription() wrapping rtcGetLocalDescription.
- RTPJitterBuffer: prime the expected sequence from the first packet (RTP
  sequence numbers start at a random value per RFC 3550), drop late packets
  wrap-aware, and jump over gaps once the reorder window fills. The previous
  advance-by-one stale handling let expectedSequence run past the live
  sequence after the first loss, permanently stalling delivery.
- RTPH264Packetizer: implement STAP-A (RFC 6184 5.7.1) - SPS/PPS commonly
  arrive aggregated; without it no frame ever decodes. Accumulate FU-A NAL
  units into full access units and emit on the RTP marker (multi-slice frames
  were emitted per-slice -> kVTVideoDecoderBadDataErr). Build the AVCC buffer
  forward from parsed units; the in-place start-code rewrite corrupted access
  units whose NAL lengths fall in 256-511 (a written length 00 00 01 xx
  re-matches as a start code).
- MediaLink: use the audio clock for playout pacing only while it is
  advancing; an attached but silent AudioPlayerNode reports currentTime == 0
  forever, pinning video at the first frame for video-only streams.
- VTDecompressionSession: log decode failures (throttled) instead of silently
  dropping them.
- DisplayLinkChoreographer (macOS): fall back to the display refresh period
  when frameInterval is 0 so elapsed-time consumers advance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant