经过一番挣扎:
在带有amdgpu 17.50的全新Ubuntu 16.04上使用 OpenCL 成功构建了 Tensorflow 。
安装了 5 个相同的 GPU (rx580),所有这些 GPU 都按预期由 clinfo 和 computecpp_info 报告。
运行 MNIST convnet 示例,TF 可以工作,但只使用 GPU0 而没有看到其他 GPU。
dmesg中没有报关于卡的错误,他们似乎都在最底层准备好了,不知道为什么SYCL似乎忽略了一些卡。
这是computecpp_info输出:
********************************************************************************
ComputeCpp Info (CE 1.0.1)
SYCL 1.2.1 revision 3
********************************************************************************
Toolchain information:
GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 5 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 3:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 4:
Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Ellesmere
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2527.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.1/platform-support-notes
********************************************************************************
这是来自tensorflow的列表:
$ python3 list_gpus.py
2018-10-17 23:52:44.268968: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-17 23:52:44.385308: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-17 23:52:44.385342: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5429869323017416982
, name: "/device:SYCL:0"
device_type: "SYCL"
memory_limit: 268435456
locality {
}
incarnation: 7347791393919061653
physical_device_desc: "id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE"
]
编辑:重启后
我真的不知道这些警告是否相关,因为它们在第一次运行后就消失了。
$ python3 list_gpus.py
2018-10-18 00:47:13.943021: I ./tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-18 00:47:13.952909: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2018-10-18 00:47:13.952930: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2018-10-18 00:47:13.952936: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:59] Found SYCL host device
2018-10-18 00:47:13.953004: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-10-18 00:47:13.953014: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: Host, name: Host Device, vendor: Codeplay Software Ltd., profile: FULL_PROFILE
编辑:dmesg 详细信息
[ 0.000000] Linux version 4.15.0-36-generic (buildd@lcy01-amd64-017) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #39~16.04.1-Ubuntu SMP Tue Sep 25 08:59:23 UTC 2018 (Ubuntu 4.15.0-36.39~16.04.1-generic 4.15.18)
[ 0.688885] pcie_mp2_amd: AMD(R) PCI-E MP2 Communication Driver Version: 1.0
[ 1.143085] [drm] amdgpu kernel modesetting enabled.
[ 1.173931] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[ 1.564757] amdgpu 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 2.280211] amdgpu 0000:03:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 2.280212] amdgpu 0000:03:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 2.280322] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.280323] [drm] amdgpu: 4096M of GTT memory ready.
[ 2.280427] amdgpu 0000:03:00.0: amdgpu: using MSI.
[ 2.280439] [drm] amdgpu: irq initialized.
[ 2.280452] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 2.280690] amdgpu 0000:03:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 2.280758] amdgpu 0000:03:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 2.280784] amdgpu 0000:03:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 2.280842] amdgpu 0000:03:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 2.280903] amdgpu 0000:03:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 2.280965] amdgpu 0000:03:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 2.280985] amdgpu 0000:03:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 2.281001] amdgpu 0000:03:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 2.281015] amdgpu 0000:03:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 2.281028] amdgpu 0000:03:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 2.281332] amdgpu 0000:03:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 2.281348] amdgpu 0000:03:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 2.285039] amdgpu 0000:03:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 2.285056] amdgpu 0000:03:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 2.285069] amdgpu 0000:03:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 2.285578] amdgpu 0000:03:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 2.285594] amdgpu 0000:03:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 2.980155] amdgpu 0000:03:00.0: kfd not supported on this ASIC
[ 2.980163] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:03:00.0 on minor 0
[ 2.980215] amdgpu 0000:06:00.0: enabling device (0000 -> 0003)
[ 4.068205] amdgpu 0000:06:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 4.068206] amdgpu 0000:06:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 4.068220] [drm] amdgpu: 4096M of VRAM memory ready
[ 4.068221] [drm] amdgpu: 4096M of GTT memory ready.
[ 4.068331] amdgpu 0000:06:00.0: amdgpu: using MSI.
[ 4.068344] [drm] amdgpu: irq initialized.
[ 4.068357] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 4.068444] amdgpu 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 4.068509] amdgpu 0000:06:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 4.068571] amdgpu 0000:06:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 4.068639] amdgpu 0000:06:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 4.068665] amdgpu 0000:06:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 4.068718] amdgpu 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 4.068740] amdgpu 0000:06:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 4.068759] amdgpu 0000:06:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 4.068774] amdgpu 0000:06:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 4.068787] amdgpu 0000:06:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 4.069074] amdgpu 0000:06:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 4.069094] amdgpu 0000:06:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 4.072854] amdgpu 0000:06:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 4.072868] amdgpu 0000:06:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 4.072881] amdgpu 0000:06:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 4.073362] amdgpu 0000:06:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 4.073376] amdgpu 0000:06:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 4.771466] amdgpu 0000:06:00.0: kfd not supported on this ASIC
[ 4.771476] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:06:00.0 on minor 2
[ 4.771515] amdgpu 0000:07:00.0: enabling device (0000 -> 0003)
[ 5.856168] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 5.856169] amdgpu 0000:07:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 5.856178] [drm] amdgpu: 4096M of VRAM memory ready
[ 5.856179] [drm] amdgpu: 4096M of GTT memory ready.
[ 5.856284] amdgpu 0000:07:00.0: amdgpu: using MSI.
[ 5.856297] [drm] amdgpu: irq initialized.
[ 5.856311] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 5.856402] amdgpu 0000:07:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 5.856441] amdgpu 0000:07:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 5.856464] amdgpu 0000:07:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 5.856541] amdgpu 0000:07:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 5.856569] amdgpu 0000:07:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 5.856641] amdgpu 0000:07:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 5.856668] amdgpu 0000:07:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 5.856690] amdgpu 0000:07:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 5.856707] amdgpu 0000:07:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 5.856722] amdgpu 0000:07:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 5.857007] amdgpu 0000:07:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 5.857027] amdgpu 0000:07:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 5.860789] amdgpu 0000:07:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 5.860803] amdgpu 0000:07:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 5.860817] amdgpu 0000:07:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 5.861298] amdgpu 0000:07:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 5.861313] amdgpu 0000:07:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 6.563837] amdgpu 0000:07:00.0: kfd not supported on this ASIC
[ 6.563845] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:07:00.0 on minor 3
[ 6.563887] amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
[ 7.648177] amdgpu 0000:08:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 7.648178] amdgpu 0000:08:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 7.648188] [drm] amdgpu: 4096M of VRAM memory ready
[ 7.648188] [drm] amdgpu: 4096M of GTT memory ready.
[ 7.648292] amdgpu 0000:08:00.0: amdgpu: using MSI.
[ 7.648306] [drm] amdgpu: irq initialized.
[ 7.648322] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 7.648406] amdgpu 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 7.648470] amdgpu 0000:08:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 7.648530] amdgpu 0000:08:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 7.648593] amdgpu 0000:08:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 7.648649] amdgpu 0000:08:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 7.648707] amdgpu 0000:08:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 7.648733] amdgpu 0000:08:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 7.648751] amdgpu 0000:08:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 7.648769] amdgpu 0000:08:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 7.648782] amdgpu 0000:08:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 7.649069] amdgpu 0000:08:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 7.649087] amdgpu 0000:08:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 7.652849] amdgpu 0000:08:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 7.652862] amdgpu 0000:08:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 7.652874] amdgpu 0000:08:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 7.653353] amdgpu 0000:08:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 7.653366] amdgpu 0000:08:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 8.355909] amdgpu 0000:08:00.0: kfd not supported on this ASIC
[ 8.355916] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:08:00.0 on minor 4
[ 8.355957] amdgpu 0000:09:00.0: enabling device (0000 -> 0003)
[ 9.440257] amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 9.440258] amdgpu 0000:09:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 9.440268] [drm] amdgpu: 4096M of VRAM memory ready
[ 9.440268] [drm] amdgpu: 4096M of GTT memory ready.
[ 9.440376] amdgpu 0000:09:00.0: amdgpu: using MSI.
[ 9.440390] [drm] amdgpu: irq initialized.
[ 9.440406] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 9.440499] amdgpu 0000:09:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 9.440563] amdgpu 0000:09:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 9.440625] amdgpu 0000:09:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 9.440690] amdgpu 0000:09:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 9.440753] amdgpu 0000:09:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 9.440808] amdgpu 0000:09:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 9.440831] amdgpu 0000:09:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 9.440849] amdgpu 0000:09:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 9.440865] amdgpu 0000:09:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 9.440880] amdgpu 0000:09:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 9.441167] amdgpu 0000:09:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 9.441184] amdgpu 0000:09:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 9.444946] amdgpu 0000:09:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 9.444964] amdgpu 0000:09:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 9.444976] amdgpu 0000:09:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 9.445456] amdgpu 0000:09:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 9.445469] amdgpu 0000:09:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 10.147558] amdgpu 0000:09:00.0: kfd not supported on this ASIC
[ 10.147564] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:09:00.0 on minor 5
[ 10.147606] amdgpu 0000:0a:00.0: enabling device (0000 -> 0003)
[ 11.232197] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 11.232198] amdgpu 0000:0a:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[ 11.232207] [drm] amdgpu: 4096M of VRAM memory ready
[ 11.232207] [drm] amdgpu: 4096M of GTT memory ready.
[ 11.232309] amdgpu 0000:0a:00.0: amdgpu: using MSI.
[ 11.232322] [drm] amdgpu: irq initialized.
[ 11.232337] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[ 11.232427] amdgpu 0000:0a:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0x (ptrval)
[ 11.232488] amdgpu 0000:0a:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0x (ptrval)
[ 11.232551] amdgpu 0000:0a:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0x (ptrval)
[ 11.232615] amdgpu 0000:0a:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0x (ptrval)
[ 11.232675] amdgpu 0000:0a:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0x (ptrval)
[ 11.232699] amdgpu 0000:0a:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0x (ptrval)
[ 11.232717] amdgpu 0000:0a:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0x (ptrval)
[ 11.232735] amdgpu 0000:0a:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0x (ptrval)
[ 11.232749] amdgpu 0000:0a:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0x (ptrval)
[ 11.232763] amdgpu 0000:0a:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0x (ptrval)
[ 11.233048] amdgpu 0000:0a:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0x (ptrval)
[ 11.233067] amdgpu 0000:0a:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0x (ptrval)
[ 11.236830] amdgpu 0000:0a:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e6420, cpu addr 0x (ptrval)
[ 11.236848] amdgpu 0000:0a:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0x (ptrval)
[ 11.236860] amdgpu 0000:0a:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0x (ptrval)
[ 11.237341] amdgpu 0000:0a:00.0: fence driver on ring 15 use gpu addr 0x00000000004007e0, cpu addr 0x (ptrval)
[ 11.237355] amdgpu 0000:0a:00.0: fence driver on ring 16 use gpu addr 0x0000000000400860, cpu addr 0x (ptrval)
[ 11.939330] amdgpu 0000:0a:00.0: kfd not supported on this ASIC
[ 11.939336] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:0a:00.0 on minor 6
编辑:它与任何特定的卡无关,只是总线顺序中的第一个可用。
我尝试断开一些卡的连接,在所有测试之后,似乎很清楚 SYCL 总是只列出第一个 GPU,不管是哪一个,总是最小的可用总线号。
这也证实了卡之间没有差异,并且所有卡都可以使用(至少单独使用),所以我认为操作系统很好,我猜问题出在 SYCL 中。
请帮忙!