Race Condition in Solution to Chapter 10, Exercise 4

**Describe the bug**
In chapter 10 `chapter10/README.md`, I think there's a race condition in the solution for exercise 4. 

```cu
__global__ void CoarsenedMaxReductionKernel(float* input, float* output) {
    __shared__ float input_s[BLOCK_DIM];
    unsigned int segment = COARSE_FACTOR*2*blockDim.x*blockIdx.x;
    unsigned int i = segment + threadIdx.x;
    unsigned int t = threadIdx.x;
    float maximum_value = input[i];
    for(unsigned int tile = 1; tile < COARSE_FACTOR*2; ++tile) {
        maximum_value = fmax(maximum_value, input[i + tile*BLOCK_DIM]);
    }
    input_s[t] = maximum_value;

    for (unsigned int stride = blockDim.x/2; stride >= 1; stride /= 2){
        __syncthreads();
        if (t < stride) {
            input_s[t] = fmax(input_s[t], input_s[t + stride]);
        }
    }
    if (t == 0) {
        atomicExch(output, fmax(*output, input_s[0]));
    }
}
```

Note that in the line `atomicExch(output, fmax(*output, input_s[0]));`. I imagine that it's possible for something like the following sequence of events to happen between example blocks A and B (assuming `*output = -inf` for convience)

1. A: reads *output = -inf
2. A: computes candA = fmax(-inf, 3) = 3
3. B: reads *output = -inf
4. B: computes candB = fmax(-inf, 5) = 5
5. B: atomicExch(output, 5) -> *output becomes 5
6. A: atomicExch(output, 3) -> *output becomes 3 (wrong)

**Proposed solution**
You can use CUDA's `atomicMax` operation. (i.e. `atomicMax(output, input_s[0]);`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race Condition in Solution to Chapter 10, Exercise 4 #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Race Condition in Solution to Chapter 10, Exercise 4 #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions