Skip to content

Add concurrency checking based on GenMC - serial queue#561

Merged
Courtney3141 merged 2 commits into
mainfrom
genmc-serial
Nov 24, 2025
Merged

Add concurrency checking based on GenMC - serial queue#561
Courtney3141 merged 2 commits into
mainfrom
genmc-serial

Conversation

@KurtWu10

@KurtWu10 KurtWu10 commented Nov 17, 2025

Copy link
Copy Markdown
Contributor
  • This PR is a variant of PR Add concurrency checking based on GenMC #523 that targets serial queues. It should also fix issue serial example failures on Rasberry Pi 4B #528.

  • The following invalid assertion

    /**
    * Update the value of the tail in the shared data structure to make
    * locally enqueued data visible.
    *
    * @param queue_handle queue to update.
    * @param local_tail tail which points to the last character enqueued.
    */
    static inline void serial_update_shared_tail(serial_queue_handle_t *queue_handle, uint32_t local_tail)
    {
    uint32_t current_length = serial_queue_length(queue_handle);
    uint32_t new_length = local_tail - queue_handle->queue->head;
    /* Ensure updates to tail don't overwrite existing data */
    assert(new_length >= current_length);
    in L172 has also been removed. It is invalid because the consumer can consume multiple queue entries and update the head between L168 and L169.

    A similar assertion in serial_update_shared_head() has also been removed.

  • Queue operations with the _length suffix always use relaxed atomic operation to prevent data races in all situations. These operation do not provide any synchronisation. On the contrary, other operations with _empty / _full / _free suffix use acquire atomic operation appropriately for synchronisation.

Limitations

  • On the memory model side, compared with PR Add concurrency checking based on GenMC #523, this PR uses additional relaxed atomic operations that may be a concern for verifiers like GenMC.

  • The test does not cover all serial queue APIs.

  • There are some redundant memory operations that will not be removed in compiler's dead-code elimination optimisation, e.g. in the updated serial_update_shared_tail().

@KurtWu10 KurtWu10 marked this pull request as ready for review November 17, 2025 06:58
@Ivan-Velickovic Ivan-Velickovic added the hardware-test Run the hardware tests on this PR. label Nov 17, 2025
@KurtWu10 KurtWu10 removed the hardware-test Run the hardware tests on this PR. label Nov 17, 2025
@KurtWu10 KurtWu10 added the hardware-test Run the hardware tests on this PR. label Nov 17, 2025
@KurtWu10 KurtWu10 removed the hardware-test Run the hardware tests on this PR. label Nov 17, 2025
@KurtWu10 KurtWu10 added the hardware-test Run the hardware tests on this PR. label Nov 18, 2025
@KurtWu10 KurtWu10 linked an issue Nov 18, 2025 that may be closed by this pull request

@Courtney3141 Courtney3141 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make some additional modifications to the serial queue, but otherwise this is a great PR!

Comment thread ci/genmc/genmc.sh
Comment thread ci/genmc/README.md Outdated
Comment thread include/sddf/serial/queue.h Outdated
Comment thread include/sddf/serial/queue.h
Comment thread include/sddf/serial/queue.h
@Courtney3141

Copy link
Copy Markdown
Contributor

Also, I will leave it to you to add the additional comments to all the queue functions:

"This function is only to be called by the CONSUMER/PRODUCER of the queue."

(I'll leave the phrasing up to you)

@Courtney3141

Copy link
Copy Markdown
Contributor

Also, I will leave it to you to add the additional comments to all the queue functions:

"This function is only to be called by the CONSUMER/PRODUCER of the queue."

(I'll leave the phrasing up to you)

Never mind, I did this in my last commit. Please just check that my comments are in fact correct!

Comment thread ci/genmc/os/sddf.h
Comment thread include/sddf/util/fence.h
*/
static inline uint32_t load_acquire_32(const uint32_t *ptr)
{
#ifdef CONFIG_ENABLE_SMP_SUPPORT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth having this?

I'd thought we'd discussed this with Gernot and decided the queues should just always be correct for cross-core since it's misleading if not.

(and when I tested it made no performance impact).

@KurtWu10 KurtWu10 Nov 21, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember the discussion on GitHub as well. I'll run additional benchmarks on this. The current implementation is based on memory of my recent benchmarks that it increases utilisation by 5~10%.

In addition, I will also benchmark whether the dmb ishst barrier is beneficial to the store-release function, which cannot be represented by the C standard library.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very surprised by you finding it had no performance impact, as when I tested it a long ago I remember it being very significant. Although I can't seem to find those results for the life of me...

@Courtney3141 Courtney3141 Nov 21, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally found the graph!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it probably depends on what system we test on, as they will have different costs? But yeah, IDK. All my tests were run with it forced to 1 as per the docs I had in the description.

@KurtWu10 KurtWu10 Nov 21, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is system dependent. There is an about 5-10% system-wide relative overhead on maaxboard (cortex-a53)
Screenshot from 2025-11-21 17-21-41
with higher cycles per packet as well. There is no overhead on odroid c4 (cortex-a55, under investigation). These are unicore UDP benchmarks comparing different implementations of network queues.

@KurtWu10 KurtWu10 removed the hardware-test Run the hardware tests on this PR. label Nov 21, 2025
@Courtney3141

Copy link
Copy Markdown
Contributor

I'm thinking it may be worth adding an introductory comment at the top of the serial queue file saying something along the lines of:

The serial queue, like all sDDF queues, is an implementation of a single-producer, single-consumer FIFO queue. The key assumption being that only the producer is permitted to modify the tail, and only the consumer is permitted to modify the head. Both components are permitted to read both indices. The library's atomic operations are written to ensure correctness under these assumptions, thus each function's description contains an explicit notes on its assumed caller.

@Courtney3141 Courtney3141 force-pushed the genmc-serial branch 4 times, most recently from 4711f06 to 8a99a20 Compare November 24, 2025 05:32
Signed-off-by: Ivan Velickovic <i.velickovic@unsw.edu.au>
Signed-off-by: Kurt Wu <rihui.wu@unsw.edu.au>
Signed-off-by: Kurt Wu <rihui.wu@unsw.edu.au>
@Courtney3141 Courtney3141 merged commit 82e1306 into main Nov 24, 2025
14 checks passed
@Courtney3141 Courtney3141 deleted the genmc-serial branch November 24, 2025 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

serial example failures on Rasberry Pi 4B

4 participants