0

我正在softRoCE之上开发Accelio。

Ib devices configured -
# ibv_devices 
    device                 node GUID
    ------              ----------------
    rxe1                821f02fffef91598
    rxe0                d6bed9fffebe94af
error while running the accelio client -
# xio_ow_client 
 =============================================
 Server Address     : 127.0.0.1
 Server Port        : 2061
 Transport      : rdma
 Header Length      : 32
 Data Length        : 32
 Connection Index   : 0
 CPU Affinity       : 0
 Finite run     : 0
 =============================================
**** starting ...
session event: connection error. reason: No such device

# rping -c
rdma_resolve_route: No such device

因此检查了 opensm 状态 - # /etc/init.d/opensmd status opensm is stopped # /etc/init.d/opensmd start opensm start [FAILED]

# tail -f /var/log/opensm.log 
Jul 09 15:04:45 655213 [AA4F3700] 0x03 -> OpenSM 3.3.7
Jul 09 15:04:45 692960 [AA4F3700] 0x80 -> OpenSM 3.3.7
Jul 09 15:04:45 693149 [AA4F3700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Jul 09 15:04:45 797977 [AA4F3700] 0x80 -> Entering DISCOVERING state
Jul 09 15:04:45 799152 [AA4F3700] 0x02 -> osm_vendor_bind: Binding to port 0xd6bed9fffebe94af
Jul 09 15:04:45 800414 [AA4F3700] 0x01 -> osm_vendor_bind: ERR 5426: Unable to register class 129 version 1
Jul 09 15:04:45 800422 [AA4F3700] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Jul 09 15:04:45 800425 [AA4F3700] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
Jul 09 15:04:45 800430 [AA4F3700] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Jul 09 15:04:45 829702 [AA4F3700] 0x80 -> Exiting SM

我会很感激一些指示,这样我就可以理解我哪里出错了。

4

1 回答 1

0

RoCE 设备不需要 OpenSM。因此,当您只有 RoCE 设备时无法启动 OpenSM 是可以预料的。

由于您未指定要连接的服务器,因此 rping 无法运行。假设您的机器支持 RoCE 的接口位于 IP 192.168.1.2(服务器)和 192.168.1.3(客户端),您应该运行以下命令:

server$ rping -s -a 192.168.1.2
client$ rping -c -a 192.168.1.2

谢谢,

——沙查尔

于 2015-07-20T10:50:22.840 回答