Add rpi4 genet driver#650
Conversation
cb0f18d to
72ba56b
Compare
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
72ba56b to
b604a8f
Compare
Courtney3141
left a comment
There was a problem hiding this comment.
I will see if I can tidy up the changes to the general echo server code, if you could address the minor comments on the Ethernet driver.
Also, if you could please add an issue for the lwip error message we get, so we can look into that later.
Once I have done with my changes, if you could review them please, then we can merge 👍
1862749 to
7cfe7f4
Compare
This GENET driver is derived from Linux, U-boot and RT-Thread source code due to lack of public documentation. We use only the default ring (i.e. 16) for both Rx and Tx for simplification. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
The rpi4 GENET hardware requires a pseudo header checksum calculated by software, and 64-bytes pre-appended configuration space for each of Rx/Tx packets, which means the actual payload is shifted to the offset 64. For now, the pseudo checksum is just hackly implemented for this special case. Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Co-authored-by: Kurt Wu <rihui.wu@unsw.edu.au> Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
Signed-off-by: Terry Bai <tianyi.bai@unsw.edu.au>
7cfe7f4 to
4230b07
Compare
… of buffer for Rx, use lib sDDF lwIP intercept tx support for adding checksum metadata before packet Signed-off-by: Courtney Darville <courtneydarville94@outlook.com>
| #if defined(CONFIG_PLAT_BCM2711) | ||
| sddf_lwip_init(&lib_sddf_lwip_config, &net_config, &timer_config, net_rx_handle, net_tx_handle, NULL, NULL, | ||
| netif_status_callback, NULL, pbuf_needs_checksum, add_checksum_and_transmit); | ||
| #else | ||
| sddf_lwip_init(&lib_sddf_lwip_config, &net_config, &timer_config, net_rx_handle, net_tx_handle, NULL, NULL, | ||
| netif_status_callback, NULL, NULL, NULL); | ||
| #endif |
There was a problem hiding this comment.
@Courtney3141 I talked to Terry a bit about this and he mentioned you prefer supporting the pseudo checksum at the application level rather than in the library.
Doing it here means that every single networking application would have to have an ifdef for platforms that don't have full hardware checksum and I don't think that really scales. It also means that existing applications wouldn't work with the BCM ethernet driver until we explicitly add support for it.
He mentioned you had concerns about putting this kind of functionality in the library itself, can you explain that?
There was a problem hiding this comment.
I saw lib sDDF lwIP simply as a way to bridge the gap between lwIP and sDDF queues and the signalling protocol. So I didn't like the idea of putting a single platform specific checksum metadata and packet rearrangement baked into the library.
However, I see the point you are making. After thinking it through, I think we need a better solution to this - baking it into the library won't work in all cases either, in particular it won't work for the vswitch case. I think what we want is the option to utilise checksum offload or not, which could be configured using the metaprogram. Let's discuss when we return to work.
The implementation mainly refers to Linux, U-Boot and RT-Thread source code due to lack of documentation. Compared to Linux using the second IRQs for seperate rings, we use only the default ring (i.e., ring 16) and the first IRQ for Rx/Tx status update.
The GENET NIC supports only one checksum which can be either network layer (e.g., IP) or transport layer (e.g., TCP/UDP) at a time, which means the checksum of another layer needs to be calculated in software. To enable hardware checksum offload, a 64-byte space needs to be pre-appended to each of Rx/Tx packets, causing the actual payload to be shifted to the byte 64. Also, a pseudo header checksum needs to be calculated by software (with a constant time cost) and filled in the checksum field. A rough benchmark shows that handling TCP/UDP checksum calculation to hardware saves around 8.5% total CPU Utilisation.
Unlike most of other NICs, the driver needs to explicitly doorbell the device by updating
prod_indexof Tx ring orcons_indexof Rx ring once there are some work to do. However, each doorbell takes over 300 cycles, so the hardware would have likely finished the work before jumping out from the whole loop inhandle_irq(), undermining the batching of the packets.The basic benchmark results: (The data has been added to the internal benchmark spreadsheets)
A tiny issue occurred during the full benchmark:
memp_malloc: out of memory in pool TCP_PCBis printed after exactly 10 benchmarks, but could be fixed in another commit later.