macros - 在预处理器中检测 -xarch 选项？

Question

我在 Solaris 11 上使用 Sun Studio 12.4 和 12.5。我们有一个源文件，它提供了 CRC32 的直接 C/C++ 实现，或使用 Intel 内在函数的 CRC32 优化版本。在运行时，函数指针会填充正确的实现。

在具有双 Xeon 的 x86 服务器上进行测试会产生以下结果，因为我们正在根据编译器版本提供代码路径。SunCC 12.1 添加了对 SSE4 的支持（如果我正确解析了矩阵），所以我们尝试在__SUNPRO_CC >= 0x5100.

"crc.cpp", line 311: ube: error: _mm_crc32_u8 intrinsic requires at least -xarch=sse4_2.

SunCC没有定义惯用的 GCC 定义，例如__SSE4_1__和__SSE4_2__。此外，SunCC 似乎没有像 MS VC++ 那样提供内在函数，其中编译器版本表明支持。

SunCC 似乎启用了基于-xarch选项的功能，但我不清楚如何在预处理器中检测它。此外，using-xarch设置了一些导致程序在低级处理器（类似于“最小”平台）上执行失败的位。

我有两个问题。

如何检测-xarch预处理器中的选项？
如何禁用这些-xarch位以便程序可以在下级处理器上运行？

下面是来自用-xarch=aes. 请注意，没有任何迹象表明可用的功能。

$ /opt/solarisstudio12.4/bin/CC -native -m64 -xarch=aes -xdumpmacros -E /dev/null 2>&1 | /usr/gnu/bin/sort --ignore-case

#1 "/dev/null"
#define __alignof__ __alignof
#define __amd64 1
#define __amd64__ 1
#define __ARRAYNEW 1
#define __asm asm
#define __asm__ asm
#define __attribute __attribute__
#define __builtin_constant_p __oracle_builtin_constant_p
#define __builtin_fpclassify __oracle_builtin_fpclassify
#define __builtin_huge_val __oracle_builtin_huge_val
#define __builtin_huge_valf __oracle_builtin_huge_valf
#define __builtin_huge_vall __oracle_builtin_huge_vall
#define __builtin_infinity __oracle_builtin_infinity
#define __builtin_isfinite __oracle_builtin_isfinite
#define __builtin_isgreater __oracle_builtin_isgreater
#define __builtin_isgreaterequal __oracle_builtin_isgreaterequal
#define __builtin_isinf __oracle_builtin_isinf
#define __builtin_isless __oracle_builtin_isless
#define __builtin_islessequal __oracle_builtin_islessequal
#define __builtin_islessgreater __oracle_builtin_islessgreater
#define __builtin_isnan __oracle_builtin_isnan
#define __builtin_isnormal __oracle_builtin_isnormal
#define __builtin_isunordered __oracle_builtin_isunordered
#define __builtin_nan __oracle_builtin_nan
#define __builtin_signbit __oracle_builtin_signbit
#define __BUILTIN_VA_STRUCT 1
#define __cplusplus 199711L
#define __DATE__ "Jul 11 2016"
#define __FILE__ 
#define __has_attribute(x) __oracle_has_attribute(x)
#define __has_nothrow_assign(x) __oracle_has_nothrow_assign(x)
#define __has_nothrow_constructor(x) __oracle_has_nothrow_constructor(x)
#define __has_nothrow_copy(x) __oracle_has_nothrow_copy(x)
#define __has_trivial_assign(x) __oracle_has_trivial_assign(x)
#define __has_trivial_constructor(x) __oracle_has_trivial_constructor(x)
#define __has_trivial_copy(x) __oracle_has_trivial_copy(x)
#define __has_trivial_destructor(x) __oracle_has_trivial_destructor(x)
#define __has_virtual_destructor(x) __oracle_has_virtual_destructor(x)
#define __is_abstract(x) __oracle_is_abstract(x)
#define __is_base_of(x,y) __oracle_is_base_of(x,y)
#define __is_class(x) __oracle_is_class(x)
#define __is_empty(x) __oracle_is_empty(x)
#define __is_enum(x) __oracle_is_enum(x)
#define __is_final(x) __oracle_is_final(x)
#define __is_literal_type(x) __oracle_is_literal_type(x)
#define __is_pod(x) __oracle_is_pod(x)
#define __is_polymorphic(x) __oracle_is_polymorphic(x)
#define __is_standard_layout(x) __oracle_is_standard_layout(x)
#define __is_trivial(x) __oracle_is_trivial(x)
#define __is_union(x) __oracle_is_union(x)
#define __LINE__ 
#define __LP64__ 1
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __STDC__ 0
#define __sun 1
#define __SUN_PREFETCH 1
#define __SunOS 1
#define __SunOS_5_11 1
#define __SUNPRO_CC 0x5130
#define __SUNPRO_CC_COMPAT 5
#define __SVR4 1
#define __TIME__ "20:58:00"
#define __underlying_type(x) __oracle_underlying_type(x)
#define __unix 1
#define __volatile volatile
#define __volatile__ volatile
#define __x86_64 1
#define __x86_64__ 1
#define _BOOL 1
#define _LARGEFILE64_SOURCE 1
#define _LP64 1
#define _SIGNEDCHAR_ 1
#define _TEMPLATE_NO_EXTDEF 1
#define _WCHAR_T 
#define sun 1
#define unix 1

score 3 · Accepted Answer

For your second question:

how do I disable the -xarch bits so the program can run on down level processors?

See Chapter 7 Capability Processing of the Linkers and Libraries Guide:

https://docs.oracle.com/cd/E53394_01/html/E54813/index.html

This shows you how to deliver multiple instances of the same function which are tagged with the capability bits. The runtime linker will resolve which function is used based on the reported capabilities.

If you really want to manage the capability bits yourself, see Chapter 9 Mapfiles in particular section CAPABILITY Directive. This shows how to remove capabilities from the generated object.

score 2 · Accepted Answer

I believe that for you particular situation (the second part of it) the only simple way to do what you want is this: compile with explicitly set "-xarch=sse4.2" (this allows the compiler to expand SSE4.2 intrinsics) and then strip off the HWCAP bits down to your minimal architecture (this makes your program runnable on pre-SSE4.2 hardware).

For stripping HWCAP see: https://docs.oracle.com/cd/E23823_01/html/816-5165/elfedit-1.html

(Example 2 Removing a Hardware Capability Bit)

score 1 · Accepted Answer

首先，您不想从编译的二进制文件中删除指令集标志。当您使用-xarch=NNNN选项进行编译时，编译将使用这些指令。如果您尝试在未实现-xarch参数中提供的体系结构指令的“较低”处理器上运行，那么您的二进制文件很有可能无法正常工作。

来自Solaris Studio 12.4：C 用户指南：

1.3 二进制兼容性验证

在 Solaris 系统上，从 Solaris Studio 11 开始，使用 Oracle Solaris Studio 编译器编译的程序二进制文件标有体系结构硬件标志，指示编译后的二进制文件所采用的指令集。在运行时，检查这些标记标志以验证二进制文件是否可以在它尝试执行的硬件上运行。

在未启用适当功能或指令集扩展的平台上运行不包含这些体系结构硬件标志的程序可能会导致分段错误或不正确的结果发生而没有任何明确的警告消息。

还要注意功能和指令集的提及。根据我对 Solaris 文档的经验，这一点就足以警告说，可能还有更多

I don't know of any way to do detect the available instruction set via the preprocessor. You may be able to get help on the Oracle forum for Solaris Studio at https://community.oracle.com/community/server_%26_storage_systems/application_development_in_c__c%2B%2B__and_fortran/developer_studio_c_c%2B%2B_fortran_compilers

I suspect that even there, you won't find a way to use the preprocessor. The usual way of providing platform- and instruction-set specific implementations on Solaris is via specific shared objects. From the Solaris Linker and Libraries Guide:

Instruction Set Specific Shared Objects

The dynamic token $ISALIST is expanded at runtime to reflect the native instruction sets executable on this platform, as displayed by the utility isalist(1).

Any string name that incorporates the $ISALIST token is effectively duplicated into multiple strings. Each string is assigned one of the available instruction sets. This token is only available for filter or runpath specifications.

...

Or an application with similar dependencies is executed on an MMX configured Pentium Pro:
$ ldd -ls prog
.....
  find object=libbar.so.1; required by ./libfoo.so.1
    search path=/opt/ISV/lib/$ISALIST  (RPATH from file ./libfoo.so.1)
      trying path=/opt/ISV/lib/pentium_pro+mmx/libbar.so.1
      trying path=/opt/ISV/lib/pentium_pro/libbar.so.1
      trying path=/opt/ISV/lib/pentium+mmx/libbar.so.1
      trying path=/opt/ISV/lib/pentium/libbar.so.1
      trying path=/opt/ISV/lib/i486/libbar.so.1
      trying path=/opt/ISV/lib/i386/libbar.so.1
      trying path=/opt/ISV/lib/i86/libbar.so.1

Note how the library search starts with the "highest" instruction-set specific library, and moves to "lower" libraries. This allows for multiple instruction-set specific shared objects to be located, from "fastest specific" to "slowest generic". libc.so on Solaris does this to provide platform-specific versions of library functions such as memcpy().

macros - 在预处理器中检测 -xarch 选项？

3 回答 3

Related

Reference