8

考虑一个 C++ 文件,它具有 UNIX 行结尾(即'\x0a',而不是"\x0d\x0a")并包含以下原始字符串文字:

const char foo[] = R"(hello^M
)";

^M实际字节 0x0d(即回车)在哪里)。

以下字符串比较的结果应该是什么(考虑到标准对原始字符串文字的定义)?

strcmp("hello\r\n", foo);

字符串是否应该比较相等?(即0!=0?)

对于 GCC 4.8(在 Fedora 19 上),它们比较不平等。

这是 GCC 中的错误或功能吗?

4

2 回答 2

7

As far as the standard is concerned, you can only use members of the basic source character set in the string literals (and elsewhere in the program). How the physical representation of the program is mapped to the basic source character set is implementation-defined.

g++ apparently thinks that ASCII \x0A, ASCII \x0D, and ASCII \x0D\x0A are all valid representations of the member of the basic source character set called "newline". Which is totally reasonable, given that it is desirable for source code transferred between Windows, Unix and Mac OS X Classic machines to keep its meaning.

于 2014-04-05T08:16:29.800 回答
0

原始字符串文字不是完全原始的,因为它们通过编译器到达您的程序,该编译器读取和解释输入 C++ 文件。在 strcmp'ing 2 个字符串之前,您可以检查原始字符串的大小 - 它与 ^M (\x0d) 字符数的预期不同。

您可以求助于以二进制形式读取数据,例如(二进制读取/w 示例):

std::ifstream infile ("test.txt", std::ifstream::binary);
infile.seekg (0,infile.end);
long size = infile.tellg();
infile.seekg (0);
char* buffer = new char[size];
infile.read (buffer,size);

或者您可以坚持使用原始文字,但使用一些技巧 - 用文字中的其他一些字符替换所有“坏”字符,然后在使用此文字时进行反向替换,例如:

... all includes ...

std::string str = R"(hello|
)";

int main()
{
  std::replace(str.begin(), str.end(), '|', '\015');
  std::cout << strcmp("hello\r\n", str.data()) << std::endl;
}
于 2019-09-23T12:00:45.907 回答