# Importance of CUDA volatile keyword with shared memory

When coordinating information between threads in a warp via shared memory in CUDA, you might need to use the volatile keyword when declaring the shared memory array, as in volatile __shared__ int array[SIZE].

In particular, when one thread in the warp (I like to call it the “warp leader”) writes to a location in the shared memory that the rest of the warp threads (“warp workers”) need to read, I found that the volatile keyword is required so that all the workers see the written value.

For example:

__shared__ array[NUM_WARPS];

I should note that this could be avoided by syncing all the threads in the block with __syncthreads(), but in my case I want only the warp workers to be synced—I don’t need the entire block.