我使用 pyOpenCl 2013.1,我的代码在 nVidia GPU、AMD CPU 和 AMD GPU 上崩溃,但在 Intel CPU 上运行。
使用 nvidia GPU,调用内核后queue.finish出现错误。
LogicError: clFinish failed: invalid command queue
我在以下代码段的第 48 行找到了原因。
1: typedef struct {
2: int global_index;
3: int local_index;
4: float speed_limit;
5: float width;
6: } segment_t;
7:
8: typedef struct {
9: int item_count;
10: segment_t first_item;
11: } segment_list_t;
12:
13: void explode_segment_list_t(segment_list_t* list, segment_t** array)
14: {
15: array[0] = &(list->first_item);
16: }
17:
18:
19:
20: /*
21: * ro_data is read-only array of 3316 byte (829 int)
22: * wo_data is write-only array of 3316 byte (829 int)
23: */
24: __kernel void test_kernel(global int* ro_data, global int* wo_data)
25: {
26: unsigned int i = get_global_id(0);
27:
28: // copy uncasted, primitive types
29: for(int index = 0; index < ro_data[0]; index++)
30: wo_data[index] = ro_data[index]; // this works
31:
32: // access casted local struct
33: int temp[829] = {0};
34: segment_list_t* casted_temp_list = (segment_list_t*)temp;
35: casted_temp_list->item_count = 1337; // this works
36: // do more tests
37: segment_t* casted_temp_array;
38: explode_segment_list_t(casted_temp_list, &casted_temp_array);
39: casted_temp_array[1].global_index = 1;
40: casted_temp_array[2].global_index = 2; // this works
41:
42: // copy local data to global data
43: for(int index = 0; index < ro_data[0]; index++)
44: wo_data[index] = temp[index]; // this works
45:
46: // access casted global memory
47: segment_list_t* casted_wo_data = (segment_list_t*)wo_data;
48: casted_wo_data->item_count = 42; // this fails on GPU but works on CPU
49:
50: }
一个丑陋的内存浪费修复方法是:分配一个本地数组,复制数据然后强制转换。但我敢肯定,我在这里做错了什么......但是什么?
谢谢你的帮助!
编辑:在 AMD 设备(CPU 和 GPU)上,它失败并显示更多信息:
*error: invalid type conversion
segment_list_t* casted_wo_data = (segment_list_t*)wo_data;*