我正在将 openmpi-1.4.5 从源代码安装到具有 GLIBC_2.11 向后兼容性的 gentoo 中,以便在具有一个 debian gnu/linux(挤压)计算节点的 NFS(集群 HPC)下运行。类似于拥有两个具有独立书店的系统,但两者都可以在高性能网络中执行文件。这个想法是两个操作系统都可以通过 MPI 运行文件。这些是我为配置和制作而执行的步骤:
configure:
.././configure --prefix=/usr/local/ompi-compat --build=x86_64-pc-linux-gnu --with-openib CC=x86_64-pc-linux-gnu-gcc -include /usr/local/include/gcc-preinclude.h
make:
make LDFLAGS="-Wl,-rpath -Wl,/spoa/usr/lib64 -Wl,-rpath -Wl,/usr/local/ompi-compat/lib" 2>&1 | tee make02.log
后来,制作阶段被打破:
...
..
libtool: compile: x86_64-pc-linux-gnu-gcc -include /usr/local/include/gcc-preinclude.h -DHAVE_CONFIG_H -I. -I../../.././opal/asm -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../. -I../.. -I../../.././opal/include -I../../.././orte/include -I../../.././ompi/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -MT atomic-asm.lo -MD -MP -MF .deps/atomic-asm.Tpo -c atomic-asm.S -fPIC -DPIC -o .libs/atomic-asm.o
/usr/local/include/gcc-preinclude.h:汇编程序消息:
/usr/local/include/gcc-preinclude.h:1: Error: invalid character '(' in mnemonic
make[2]: [atomic-asm.lo] Error 1
make[2]: Leaving directory `/usr/local/src/openmpi-1.4.5/build/opal/asm'
make[1]: [all-recursive] Error 1
make[1]: Leaving directory `/usr/local/src/openmpi-1.4.5/build/opal'
make: [all-recursive] Error 1
模块 asm 存在问题,因为我使用一个 include c 适配来从头文件 /usr/local/include/gcc-preinclude.h 中排除 memcpy@2_2_5 符号:
__asm__(".symver memcpy,memcpy@GLIBC_2.2.5");
我将这种情况修复到文件夹 add --tag=CC 并从 x86_64-pc-linux-gnu-gcc 编译器中剪切“-include /usr/local/include/gcc-preinclude.h”:
/bin/sh ../../libtool --tag=CC --mode=compile x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../../.././opal/asm -I../../opal/include -I../../orte/include -I../../ompi/include -I../../opal/mca/paffinity/linux/plpa/src/libplpa -I../../../. -I../.. -I../../.././opal/include -I../../.././orte/include -I../../.././ompi/include -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -MT atomic-asm.lo -MD -MP -MF $depbase.Tpo -c -o atomic-asm.lo atomic-asm.S &&mv -f $depbase.Tpo $depbase.Plo
将此行编译到 opal/asm/ 文件夹中很棒!
返回主构建文件夹并继续编译:
make LDFLAGS="-Wl,-rpath -Wl,/spoa/usr/lib64 -Wl,-rpath -Wl,/usr/local/ompi-compat/lib" 2>&1 | tee make02.log
编译后几分钟就生效了。
但是当我测试一些可执行文件以运行时,显示为“分段错误”:
cd opal/tools/wrappers/.libs
.libs # ./opal_wrapper
Segmentation fault
因此,对于所有可执行文件...
Now, veryfing the Shared Library path linking to ELF File:
ldd opal_wrapper
linux-vdso.so.1 => (0x00007fff3ffff000)
libopen-pal.so.0 => /usr/local/ompi-compat/lib/libopen-pal.so.0 (0x00002b8ea8239000)
libdl.so.2 => /spoa/usr/lib64/libdl.so.2 (0x00002b8ea8493000)
libnsl.so.1 => /spoa/usr/lib64/libnsl.so.1 (0x00002b8ea8697000)
libutil.so.1 => /spoa/usr/lib64/libutil.so.1 (0x00002b8ea88af000)
libm.so.6 => /lib64/libm.so.6 (0x00002b8ea8ad5000)
libpthread.so.0 => /spoa/usr/lib64/libpthread.so.0 (0x00002b8ea8d56000)
libc.so.6 => /spoa/usr/lib64/libc.so.6 (0x00002b8ea8f72000)
/lib64/ld-linux-x86-64.so.2 (0x00002b8ea8017000)
有与 -rpath 链接的共享对象文件。我使用debian“squeeze”附带的主要原始共享对象到/spoa/usr/lib64文件夹并链接OK!但我认为失败源于与原始 gentoo 动态链接器 /lib64/ld-linux-x86_64.so.2 的链接
There is my debian "squeeze" main toolchain shared objects:
ls /spoa/usr/lib64/
ld-2.11.3.so libdl.so.2 libnsl-2.11.3.so libstdc++.so.6.0.13 libz.so.1.2.3.4
ld-linux-x86-64.so.2 libgcc_s.so libnsl.so.1 libutil-2.11.3.so
libc-2.11.3.so libgcc_s.so.1 libpthread-2.11.3.so libutil.so.1
libc.so.6 libgfortran.so.3.0.0 libpthread.so.0 libz.so
libdl-2.11.3.so libm.so libstdc++.so.6 libz.so.1
如果我手动将此可执行文件的链接器更改为 gentoo 和 debian(通过 NFS)并最终运行:
/bin/sh ../../../libtool --tag=CC --mode=link x86_64-pc-linux-gnu-gcc -include /usr/local/include/gcc-preinclude.h -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -fvisibility=hidden -L/spoa/usr/lib64 -Wl,-rpath -Wl,/spoa/usr/lib64 -o opal_wrapper opal_wrapper.o ../../../opal/libopen-pal.la -lnsl -lutil -lm -Wl,-rpath -Wl,/usr/local/ompi-compat/lib -Wl,-dynamic-linker /spoa/usr/lib64/ld-linux-x86-64.so.2
结果:
libtool: link: x86_64-pc-linux-gnu-gcc -include /usr/local/include/gcc-preinclude.h -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread -fvisibility=hidden -Wl,-rpath - Wl,/spoa/usr/lib64 -o .libs/opal_wrapper opal_wrapper.o -Wl,-rpath -Wl,/usr/local/ompi- compat/lib -Wl,-dynamic-linker /spoa/usr/lib64/ld-linux-x86-64.so.2 -L/spoa/usr/lib64 ../../../opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm -pthread
和 ldd 正确显示适当的链接器:
wrappers ldd .libs/opal_wrapper
linux-vdso.so.1 => (0x00007fffc75a7000)
libopen-pal.so.0 => /usr/local/ompi-compat/lib/libopen-pal.so.0 (0x00002b396dd38000)
libdl.so.2 => /spoa/usr/lib64/libdl.so.2 (0x00002b396df91000)
libnsl.so.1 => /spoa/usr/lib64/libnsl.so.1 (0x00002b396e196000)
libutil.so.1 => /spoa/usr/lib64/libutil.so.1 (0x00002b396e3ae000)
libm.so.6 => /lib64/libm.so.6 (0x00002b396e5d3000)
libpthread.so.0 => /spoa/usr/lib64/libpthread.so.0 (0x00002b396e855000)
libc.so.6 => /spoa/usr/lib64/libc.so.6 (0x00002b396ea71000)
/spoa/usr/lib64/ld-linux-x86-64.so.2 (0x00002b396db18000)
我的问题是是否可以使用 -Wl, -dynamic-linker 生成链接到适当共享对象文件的正确 ELF?
运行:./opal_wrapper
无法打开配置文件 /usr/local/ompi-compat/share/openmpi/opal_wrapper-wrapper-data.txt 解析数据文件 opal_wrapper 时出错:未找到
我的一些主机信息:
hostname = master
uname -m = x86_64
uname -r = 3.4.5-gentoo
uname -s = Linux
uname -v = #1 SMP Mon Jul 23 21:35:06 UTC 2012