In CUDA, do non-coalesced memory accesses cause branch divergence?
I always thought that branch divergence is only caused by the branching
code, like "if", "else", "for", "switch", etc. However I have read a paper
recently in which it says:
" One can clearly observe that the number of divergent branches taken by
threads in each first exploration-based algorithm is at least twice more
important than the full exploration strategy. This is typically the
results from additional non-coalesced accesses to the global memory.
Hence, such a threads divergence leads to many memory accesses that have
to be serialized, increasing the total number of instructions executed.
One can observe that the number of warp serializations for the version
using non-coalesced accesses is between seven and sixteen times more
important than for its counterpart. Indeed, a threads divergence caused by
non-coalesced accesses leads to many memory accesses that have to be
serialized, increasing the instructions to be executed. "
It seems like, according to the author, non-coalesced accesses can cause
divergent branches. Is that true? My question is, how many reasons exactly
are there for the branch divergence? Thanks in advance.
No comments:
Post a Comment