cuda - CUDA 线程似乎不同步

Question

我有一个问题，即使我使用的是同步线程，一个线程似乎落后于其他线程。以下摘录摘自一个大型程序，我已经尽可能多地删减了它，但它仍然重现了我的问题。我发现在运行此代码时，test4 变量不会为所有线程返回相同的值。我的理解是，使用 TEST_FLAG 变量它应该引导所有线程进入if (TEST_FLAG == 2)条件，因此数组 test4 中的每个元素都应该返回一个值 43。但是我发现所有元素都返回 43，除了线程 0 返回 0。它似乎线程并非都到达相同的同步线程。我进行了许多测试，发现删除了更多代码，例如for (l=0; l<1; ++l)循环解决了这个问题，但我不明白为什么。任何关于为什么我的线程不都返回相同值的帮助将不胜感激。

import numpy as np
import pycuda.driver as drv
import pycuda.compiler
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import pycuda.cumath as cumath
from pycuda.compiler import SourceModule

gpu_code=SourceModule("""
    __global__ void test_sync(double *test4, double *test5)
    {
        __shared__ double rad_loc[2], boundary[2], boundary_limb_edge[2];
        __shared__ int TEST_FLAG;
        int l;


        if (blockIdx.x != 0)
        {
            return;
        }

        if(threadIdx.x == 0)
        {
            TEST_FLAG = 2;
            boundary[0] = 1;
        }

        test4[threadIdx.x] = 0;
        test5[threadIdx.x] = 0;

        if (threadIdx.x == 0)
        {
            rad_loc[0] = 0.0;
        }
        __syncthreads();

        for (l=0; l<1; ++l)
        {
            __syncthreads();
            if (rad_loc[0] > 0.0)
            {
                test5[threadIdx.x] += 1;

                if ((int)boundary[0] == -1)
                {
                    __syncthreads();
                    continue;
                }
            }
            else
            {
                if (threadIdx.x == 0)
                {
                    boundary_limb_edge[0] = 0.0;
                }
            }
            __syncthreads();

            if (TEST_FLAG == 2)
            {
                test4[threadIdx.x] = 43;
                __syncthreads();

                TEST_FLAG = 99;
            }
            __syncthreads();

            return;
        }
        return;
    }

    """)


test_sync = gpu_code.get_function("test_sync")

DATA_ROWS=[100,100]

blockshape_data_mags    = (int(64),1, 1)
gridshape_data_mags     = (int(sum(DATA_ROWS)), 1)

test4 = np.zeros([1*blockshape_data_mags[0]], np.float64)
test5 = np.zeros([1*blockshape_data_mags[0]], np.float64)

test_sync(drv.InOut(test4), drv.InOut(test5), block=blockshape_data_mags, grid=gridshape_data_mags)

print test4
print test5

score 1 · Accepted Answer

您的问题在于语句 TEST_FLAG=99。对于其中一个线程，它在线程 0 进入条件块之前执行，并为您提供未定义的行为。如果我注释掉 TEST_FLAG=99，代码会按预期运行。

score 1 · Accepted Answer

正如 Yuuta 所提到的，__syncthreads()行为不是在条件语句中定义的。因此，有它可能/可能不会按预期工作。您可能需要重新编写代码以避免__syncthreads()进入您的 if 条件。

您可以查看此答案和本文以获取有关__syncthreads().

同样重要的是要注意它是块级屏障。您不能使用__syncthreads(). 块必须通过内核调用同步。

cuda - CUDA 线程似乎不同步

2 回答 2

Related

Reference