KIP-932: Implement destroy APIs#5406
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
214b852 to
70c29ba
Compare
60cafef to
db6b4d8
Compare
db6b4d8 to
5cc3e10
Compare
97d35a7 to
105eb74
Compare
024e756 to
bf9fb74
Compare
a32c996 to
7b88cec
Compare
0b3197d to
9059cbc
Compare
1. src/rdkafka_broker.c: Add rd_kafka_broker_share_fetch_session_clear()
to broker thread exit path — prevents assertion failure in
rd_kafka_broker_destroy_final when toppars_to_forget is non-NULL
after a broker is decommissioned during close.
2. tests/0178-share_consumer_close.c:
- Uncomment test_close_with_broker_down_is_fatal_cb
- Add delayed_down_state struct + delayed_down_cb background thread
- Add test_close_with_broker_decommission: validates close()
completes when a broker goes down mid-close
- Register test in main_0178_share_consumer_close_local
Note: test passes on both branches (with/without rko_op_cb fix)
because set_down triggers __STATE (not __DESTROY) — decommission
is too slow to race ahead of the leave op processing. Needs
rd_kafka_mock_broker_remove_from_metadata (PR #5406) to decouple
metadata removal from connection drop for a deterministic test.
dff116d to
716df74
Compare
716df74 to
fdbac55
Compare
76eb457 to
daa5c73
Compare
e26cdc3 to
b3426df
Compare
92f947f to
59453f2
Compare
Pranav Rathi (pranavrth)
left a comment
There was a problem hiding this comment.
Added comments for the implementation part. I will go through the tests in another pass.
| * Step 1: Dispatch acknowledgement callbacks. | ||
| * Per-partition errors were set by the broker thread. | ||
| */ | ||
| rd_kafka_share_dispatch_ack_callbacks( |
There was a problem hiding this comment.
Just to remember - Let's add ack callback tests as well.
| */ | ||
| rd_kafka_resp_err_t | ||
| rd_kafka_share_consumer_closed_err(rd_kafka_share_t *rkshare); | ||
|
|
59453f2 to
36bea0e
Compare
| test_ack_cb_state_t state; | ||
| int consumed = 0; | ||
| int attempts = 0; | ||
| test_ack_cb_state_t state = {0}; |
There was a problem hiding this comment.
This issue was causing the tests fail on the feature branch. It should be fixed now.
| ack_cb_state_t state; | ||
| int consumed = 0; | ||
| int attempts = 0; | ||
| ack_cb_state_t state = {0}; |
There was a problem hiding this comment.
Same as above
| while ((rcvd < 4 || state.callback_cnt < 1) && attempts-- > 0) { | ||
| size_t batch_rcvd = 0; | ||
| err = rd_kafka_share_consume_batch(rkshare, 2000, batch, | ||
| err = rd_kafka_share_consume_batch(rkshare, 2000, batch + rcvd, |
There was a problem hiding this comment.
If msgs don't get redelivered msgs in one consume_batch call, some msgs might leak. Observed this issue in one of the runs.
| } | ||
| } | ||
|
|
||
| void rd_kafka_share_acks_clear_during_broker_decommission( |
There was a problem hiding this comment.
Add test case for this flow.
| * apply the result to the in-flight commit_sync request (copies | ||
| * err into the cgrp's result list, decrements awaiting count, | ||
| * completes the sync if this was the last broker outstanding). */ | ||
| if (rkb->rkb_pending_commit_sync.sync_ack_details) { |
There was a problem hiding this comment.
We need to dispatch callbacks as well.
| } | ||
| } | ||
|
|
||
| void rd_kafka_share_acks_clear_during_broker_decommission( |
There was a problem hiding this comment.
add broker id as well in the logs.
| * (buf callback init -> INVALID_RECORD_STATE -> per- | ||
| * partition err or top-level err). */ | ||
| if (ack_details) { | ||
| if (ack_details && rko_orig->rko_err) { |
There was a problem hiding this comment.
Move it to before dispatch ack_details (step 1) part.
There was a problem hiding this comment.
Add test as well for this.
| rd_kafka_share_commit_sync_apply_result(rk, rkcg, ack_details); | ||
| rd_kafka_dbg( | ||
| rk, CGRP, "SHARE", | ||
| "Commit sync reply from broker %s: %s, " |
There was a problem hiding this comment.
We can change wording for this.
| rd_atomic32_add(&rkb->termination_in_progress, 1); | ||
|
|
||
| if (RD_KAFKA_IS_SHARE_CONSUMER(rk) && | ||
| rkb->rkb_source == RD_KAFKA_LEARNED) { |
There was a problem hiding this comment.
Let's verify this is correct.
| rd_bool_t should_fetch, | ||
| rd_bool_t should_leave); | ||
|
|
||
| void rd_kafka_share_commit_sync_apply_result(rd_kafka_t *rk, |
There was a problem hiding this comment.
Check if this should be moved to any other file?
| int receipt_cnt; | ||
| int receipt_capacity; | ||
| int callback_invocations; | ||
| mtx_t lock; |
There was a problem hiding this comment.
Check if it is needed.
| } | ||
| attempts++; | ||
| } | ||
| TEST_ASSERT(rcvd >= TEST_MSGS, |
| * | ||
| * surviving_partition: TEST_MSGS/2 offsets, NO_ERROR | ||
| * target_partition: TEST_MSGS/2 offsets, __DESTROY_BROKER */ | ||
| { |
This PR implements the following APIs related to share consumers:
void rd_kafka_share_destroy(rd_kafka_share_t *rkshare);void rd_kafka_share_destroy_flags(rd_kafka_share_t *rkshare, int flags);Note, handling of _DESTROY* in acknowledgement callback will be covered in a separate PR
Pending:
RD_KAFKA_RESP_ERR__DESTROYcheck inrd_kafka_share_fetch_reply_opTest Cases Added (so far):