feat(packetparser): Allow sampling of packets#1767
Conversation
ebac9e3 to
b2a5ab5
Compare
|
@mmckeen I will try to make a pass this week |
nddq
left a comment
There was a problem hiding this comment.
I think we should revisit the syntax of DataSamplingRate. For example, since I know the code context, I understand that setting DataSamplingRate to 1 means every packet will be sampled. But for someone new to Retina, this behavior may not be obvious.
|
Also, I'm all for ditching the reporting period entirely if we can make packet sampling works as it is a lot more flexible than the currently hardcoding value that we are using for reporting period 🙂 cc @anubhabMajumdar @SRodi So what I'm thinking of is either we report a packet because it carries a new flag or because it was randomly sampled. Some more advanced behavior that I could think of is that we can monitor cpu usage and dynamically adjust the sampling rate and recompile the packet parser program accordingly. |
The reporting interval might still be valid to enable us to make sure we report eventually for every 5-tuple. For example, we wouldn't want to sample and then never report the counters for a particular connection. The reporting interval being something smallish also ensures Prometheus |
|
This PR will be closed in 7 days due to inactivity. |
|
Apologies, been a while for me to circle back on this. I'll try and address the comments this week 🙇 |
5ddf47d to
49ed563
Compare
|
@nddq updated this with docs, the sampling calculation still looks accurate to me but happy for feedback 🙇 |
|
@mmckeen thanks for the docs! After re-reading it, the sampling calculation makes more sense to me now (I thought by default, the |
|
This PR will be closed in 7 days due to inactivity. |
|
hey @mmckeen , I just merged some changes for conntrack, can you rebase your PR? Thanks! |
49ed563 to
a7e95f4
Compare
Done! |
Signed-off-by: Matthew McKeen <matthew.mckeen@fastly.com>
a7e95f4 to
12c6e02
Compare
# Description This PR fixes unit test failures in the `packetparser` plugin: - Mocks `metrics.ParsedPacketsCounter` in `TestReadDataPodLevelEnabled` to prevent a nil pointer dereference during test execution. - Updates the expected dynamic header content in `TestPacketParseGenerate` to match the actual generated output, which now includes `DATA_SAMPLING_RATE` and `BYPASS_LOOKUP_IP_OF_INTEREST` definitions. ## Related PRs This test was broken in the following PRs - #624 - #1767 ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/Contributing/overview). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed <img width="1624" height="1343" alt="image" src="https://github.com/user-attachments/assets/9549ed15-7c62-4537-a806-2d2368d3fc66" /> ## Additional Notes We have a bigger issue which is CI is not showing a failure when single test fails, we will look into that and create another PR to fix that. See this example (FAIL not caught) - reference #1688 https://github.com/microsoft/retina/actions/runs/20115412850/job/57723528708#step:4:3145 --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
# Description This PR fixes unit test failures in the `packetparser` plugin: - Mocks `metrics.ParsedPacketsCounter` in `TestReadDataPodLevelEnabled` to prevent a nil pointer dereference during test execution. - Updates the expected dynamic header content in `TestPacketParseGenerate` to match the actual generated output, which now includes `DATA_SAMPLING_RATE` and `BYPASS_LOOKUP_IP_OF_INTEREST` definitions. ## Related PRs This test was broken in the following PRs - microsoft#624 - microsoft#1767 ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/Contributing/overview). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed <img width="1624" height="1343" alt="image" src="https://github.com/user-attachments/assets/9549ed15-7c62-4537-a806-2d2368d3fc66" /> ## Additional Notes We have a bigger issue which is CI is not showing a failure when single test fails, we will look into that and create another PR to fix that. See this example (FAIL not caught) - reference microsoft#1688 https://github.com/microsoft/retina/actions/runs/20115412850/job/57723528708#step:4:3145 --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
Description
This PR allows for optional sampling of packet reporting when in high data aggregation level for
packetparser.By default, all packets are reported but optionally
1 out of npackets are sampled by random chance with the exception of certain important control flags or when hitting the reporting interval.This allows Retina to scale to high network volume environments at the trade-off of some reporting granularity.
The performance impact of this is mostly for workloads with lots of new connections, connections already tracked in the conntrack table rely on #1665 for scalability.
The behavior added in #1665 allows for accurate reporting of metrics despite sampling being in place.
Related Issue
#1760
Checklist
git commit -S -s ...). See this documentation on signing commits.Screenshots (if applicable) or Testing Completed
Main
After the change (with default sampling rate of 1)
After the change (with sampling rate of 1000)
Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.