Skip to content

Fix bug in Semaphore::timed_wait on POSIX #795

@gavv

Description

@gavv

deadline parameter of Semaphore::timed_wait is supposed to be absolute time in CLOCK_MONOTONIC domain. The code that uses it, as well as your example, pass monotonic timestamps to it.

However in roc_core/target_posix_ext/roc_core/semaphore.cpp we pass deadline to sem_timedwait, which expects time in CLOCK_REALTIME domain (i.e. since epoch).

Note that roc_core/target_darwin/roc_core/semaphore.cpp doesn't have this problem.

Suggested fix:

  1. When sem_clockwait system call is available (we need a compile-time check), then use it instead of sem_timedwait with CLOCK_MONOTONIC clock id.

    This is the most efficient approach.

  2. Otherwise, modify wait loop: each iteration, before calling sem_timedwait, compute wallclock deadline based on current time and monotonic deadline.


Another concern with sem_timedwait is what happens if system time changes.

Seems that on recent linux / glibc it's not affected by time changes, i.e. if we asked it to wait until now + 100 seconds, it will sleep for 100 seconds even if time changed to past or future.

Just verified it using this small program:

Click to expand
#include <errno.h>
#include <semaphore.h>
#include <stdint.h>
#include <stdio.h>
#include <time.h>

int64_t timestamp(clockid_t clock) {
    timespec ts;
    clock_gettime(clock, &ts);
    return (int64_t)ts.tv_sec * 1000000000 + (int64_t)ts.tv_nsec;
}

int main() {
    sem_t sem;
    sem_init(&sem, 0, 0);

    int64_t deadline = timestamp(CLOCK_REALTIME) + 100000000000; // 100 seconds

    timespec ts;
    ts.tv_sec = long(deadline / 1000000000);
    ts.tv_nsec = long(deadline % 1000000000);

    int ret = sem_timedwait(&sem, &ts);

    printf("ret = %d\n", ret);
    perror("errno");
}

It will sleep 100 seconds even if I run sudo date --set="2000-01-01 00:00:00" or sudo date --set="2027-01-01 00:00:00" from another terminal.

However it's not necessary to be true on other POSIX platforms. In worst case, sem_timedwait may hang, but if we're more lucky, it may return a error code.

Currently we panic if error is not ETIMEOUT (= return) and EINTR (= repeat loop). I think we should also allow EINVAL and EAGAIN (and repeat the loop), I suspect they are good candidates for this situation, though I haven't found anything in Internet.


Ideally we should also add a few tests for Semaphore::timed_wait.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Defect.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions