Skip to content

Add rpi4 genet driver#650

Draft
terryzbai wants to merge 8 commits into
mainfrom
rpi4_genet_driver
Draft

Add rpi4 genet driver#650
terryzbai wants to merge 8 commits into
mainfrom
rpi4_genet_driver

Conversation

@terryzbai

@terryzbai terryzbai commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

The implementation mainly refers to Linux, U-Boot and RT-Thread source code due to lack of documentation. Compared to Linux using the second IRQs for seperate rings, we use only the default ring (i.e., ring 16) and the first IRQ for Rx/Tx status update.

The GENET NIC supports only one checksum which can be either network layer (e.g., IP) or transport layer (e.g., TCP/UDP) at a time, which means the checksum of another layer needs to be calculated in software. To enable hardware checksum offload, a 64-byte space needs to be pre-appended to each of Rx/Tx packets, causing the actual payload to be shifted to the byte 64. Also, a pseudo header checksum needs to be calculated by software (with a constant time cost) and filled in the checksum field. A rough benchmark shows that handling TCP/UDP checksum calculation to hardware saves around 8.5% total CPU Utilisation.

Unlike most of other NICs, the driver needs to explicitly doorbell the device by updating prod_index of Tx ring or cons_index of Rx ring once there are some work to do. However, each doorbell takes over 300 cycles, so the hardware would have likely finished the work before jumping out from the whole loop in handle_irq(), undermining the batching of the packets.

The basic benchmark results: (The data has been added to the internal benchmark spreadsheets)

rpi4_udp rpi4_tcp

A tiny issue occurred during the full benchmark: memp_malloc: out of memory in pool TCP_PCB is printed after exactly 10 benchmarks, but could be fixed in another commit later.

@terryzbai terryzbai force-pushed the rpi4_genet_driver branch from cb0f18d to 72ba56b Compare March 4, 2026 00:14
@terryzbai terryzbai added drivers Issues pertaining to driver code for a device class driver-examples Issues related to examples for drivers for a target network labels Mar 4, 2026
@terryzbai terryzbai requested a review from Courtney3141 March 4, 2026 00:16
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>

@Courtney3141 Courtney3141 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will see if I can tidy up the changes to the general echo server code, if you could address the minor comments on the Ethernet driver.

Also, if you could please add an issue for the lwip error message we get, so we can look into that later.

Once I have done with my changes, if you could review them please, then we can merge 👍

Comment thread drivers/network/genet/ethernet.h Outdated
Comment thread drivers/network/genet/ethernet.h Outdated
Comment thread drivers/network/genet/ethernet.h Outdated
Comment thread drivers/network/genet/ethernet.h Outdated
Comment thread drivers/network/genet/ethernet.c
Comment thread drivers/network/genet/ethernet.c Outdated
Comment thread drivers/network/genet/ethernet.c Outdated
Comment thread drivers/network/genet/ethernet.c Outdated
@terryzbai terryzbai force-pushed the rpi4_genet_driver branch 2 times, most recently from 1862749 to 7cfe7f4 Compare March 31, 2026 00:26
terryzbai and others added 4 commits March 31, 2026 16:34
This GENET driver is derived from Linux, U-boot and RT-Thread
source code due to lack of public documentation. We use only
the default ring (i.e. 16) for both Rx and Tx for simplification.

Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
The rpi4 GENET hardware requires a pseudo header checksum
calculated by software, and 64-bytes pre-appended configuration
space for each of Rx/Tx packets, which means the actual payload
is shifted to the offset 64.

For now, the pseudo checksum is just hackly implemented for this
special case.

Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Co-authored-by: Kurt Wu <rihui.wu@unsw.edu.au>
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
… of buffer for Rx, use lib sDDF lwIP intercept tx support for adding checksum metadata before packet

Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
Comment on lines +71 to +77
#if defined(CONFIG_PLAT_BCM2711)
sddf_lwip_init(&lib_sddf_lwip_config, &net_config, &timer_config, net_rx_handle, net_tx_handle, NULL, NULL,
netif_status_callback, NULL, pbuf_needs_checksum, add_checksum_and_transmit);
#else
sddf_lwip_init(&lib_sddf_lwip_config, &net_config, &timer_config, net_rx_handle, net_tx_handle, NULL, NULL,
netif_status_callback, NULL, NULL, NULL);
#endif

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Courtney3141 I talked to Terry a bit about this and he mentioned you prefer supporting the pseudo checksum at the application level rather than in the library.

Doing it here means that every single networking application would have to have an ifdef for platforms that don't have full hardware checksum and I don't think that really scales. It also means that existing applications wouldn't work with the BCM ethernet driver until we explicitly add support for it.

He mentioned you had concerns about putting this kind of functionality in the library itself, can you explain that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw lib sDDF lwIP simply as a way to bridge the gap between lwIP and sDDF queues and the signalling protocol. So I didn't like the idea of putting a single platform specific checksum metadata and packet rearrangement baked into the library.

However, I see the point you are making. After thinking it through, I think we need a better solution to this - baking it into the library won't work in all cases either, in particular it won't work for the vswitch case. I think what we want is the option to utilise checksum offload or not, which could be configured using the metaprogram. Let's discuss when we return to work.

@Courtney3141 Courtney3141 marked this pull request as draft April 3, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

driver-examples Issues related to examples for drivers for a target drivers Issues pertaining to driver code for a device class network

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants