Skip to content

Race Condition in Solution to Chapter 10, Exercise 4 #28

Description

@asatpathy314

Describe the bug
In chapter 10 chapter10/README.md, I think there's a race condition in the solution for exercise 4.

__global__ void CoarsenedMaxReductionKernel(float* input, float* output) {
    __shared__ float input_s[BLOCK_DIM];
    unsigned int segment = COARSE_FACTOR*2*blockDim.x*blockIdx.x;
    unsigned int i = segment + threadIdx.x;
    unsigned int t = threadIdx.x;
    float maximum_value = input[i];
    for(unsigned int tile = 1; tile < COARSE_FACTOR*2; ++tile) {
        maximum_value = fmax(maximum_value, input[i + tile*BLOCK_DIM]);
    }
    input_s[t] = maximum_value;

    for (unsigned int stride = blockDim.x/2; stride >= 1; stride /= 2){
        __syncthreads();
        if (t < stride) {
            input_s[t] = fmax(input_s[t], input_s[t + stride]);
        }
    }
    if (t == 0) {
        atomicExch(output, fmax(*output, input_s[0]));
    }
}

Note that in the line atomicExch(output, fmax(*output, input_s[0]));. I imagine that it's possible for something like the following sequence of events to happen between example blocks A and B (assuming *output = -inf for convience)

  1. A: reads *output = -inf
  2. A: computes candA = fmax(-inf, 3) = 3
  3. B: reads *output = -inf
  4. B: computes candB = fmax(-inf, 5) = 5
  5. B: atomicExch(output, 5) -> *output becomes 5
  6. A: atomicExch(output, 3) -> *output becomes 3 (wrong)

Proposed solution
You can use CUDA's atomicMax operation. (i.e. atomicMax(output, input_s[0]);).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions