c++ - Windows 上的字符串大小与 Linux 上的不同

Question

我偶然发现了string::substr的奇怪行为。通常我在 Eclipse+MinGW 中的Windows 7上进行编码，但是当我在笔记本电脑上工作时，在Linux（Ubuntu 12.04）中使用 Eclipse 时，我注意到结果有所不同。

我正在使用填充了文本行的向量<字符串> 。其中一个步骤是从行中删除最后一个字符。

在win7 Eclipse中我做了：

for( int i = 0; i < (int)vectorOfLines.size(); i++ )
{
    vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-1) );
}

它按预期工作（从每一行中删除最后一个字符）

但在 Linux 中，此代码不会修剪。相反，我需要这样做：

//  -2 instead -1 character
vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).substr(0, ((string)vectorOfLines.at(i)).size()-2) );

或使用另一种方法：

vectorOfTrimmedLines.push_back( ((string)vectorOfLines.at(i)).replace( (((string)vectorOfLines.at(i)).size()-2),1,"",0 ));

当然，Linux 方法在 Windows 上的工作方式是错误的（修剪 2 个最后一个字符，或在最后一个之前替换一个）。

问题似乎是 myString.size() 在 Windows 中返回字符数，但在 Linux 中它返回字符数 + 1。难道是换行符在 Linux 上被计算？

作为 C++ 和一般编程的新手，我想知道为什么会这样，以及如何做到独立于平台。

我想知道的另一件事是：哪种方法更可取（更快）substr或replace？

编辑：用于填充字符串的方法我写了这个函数：

vector< string > ReadFile( string pathToFile )
{
    //  opening file
    ifstream myFile;
    myFile.open( pathToFile.c_str() );

    //  vector of strings that is returned by this function, contains file line by line
    vector< string > vectorOfLines;

    //  check if the file is open and then read file line by line to string element of vector
    if( myFile.is_open() )
    {
        string line;    //  this will contain the data read from current the file

        while( getline( myFile, line ) )    //  until last line in file
        {
            vectorOfLines.push_back( line );    //  add current line to new string element in vector
        }

        myFile.close(); //  close the file
    }

    //  if file does not exist
    else
    {
        cerr << "Unable to open file." << endl; //  if the file is not open output
        //throw;
    }

    return vectorOfLines;   //  return vector of lines from file
}

score 9 · Accepted Answer

不同操作系统上的文本文件并不相同。Windows 使用两个字节的代码来标记一行的结束：0x0D、0x0A。Linux 使用一个字节 0x0A。getline（以及大多数其他输入函数）知道为其编译的操作系统的约定；当它读取操作系统用来表示行尾的字符时，它会将字符替换为“\n”。所以如果你在Windows下写一个文本文件，行以0x0D、0x0A结尾；如果您在 Linux 下阅读该文本文件，getline看到 0x0D 并将其视为普通字符，那么它会看到 0x0A 并将其视为行尾。

因此，道德是当您将文本文件从一个系统移动到另一个系统时，您必须将它们转换为本地表示。ftp知道如何做到这一点。如果您在虚拟机中运行，则必须在切换系统时手动进行转换。使用 Unix 命令行很简单tr。

score 4 · Accepted Answer

这是因为在 Windows 中，换行符由两个字符 CR+LF 表示，而在 Linux 上它只有 LF，而在 Mac（OSX 之前）上它只有 CR。

只要你只在 Linux 系统上使用 Linux 上生成的文件，或者在 Windows 系统上使用 Windows 上生成的文件，你就不用担心了。但是，一旦您需要在 Windows 上使用在 Linux 上生成的文件，反之亦然，您需要正确处理换行符。

第一步，您需要以二进制模式打开文件std::ofstream infile( "filename", std::ios_base::binary);，然后您有三个选项：

您需要为所有平台确定一个换行符约定并一致地使用它，
您需要能够检测当前文件中使用的换行约定（通常通过检查第一行使用的换行来实现），将其保存在变量中，并将其传递给需要处理换行的字符串函数，
告诉用户将文件转换为正确的换行符，例如使用 dos2unix 和 unix2dos，或者如果文件传输涉及 FTP，则使用 ASCII 模式

或者，如前所述，使用 Boost。

score 0 · Accepted Answer

Windows 和 Linux/Unix 中的行尾不同——Windows 使用两个字节，Linux 使用一个。谷歌如何在 .nix 命令行上使用 tr，你会看到如何转换它们。

祝你好运！

c++ - Windows 上的字符串大小与 Linux 上的不同

3 回答 3

Related

Reference