c++ - 对字符串向量进行排序：普通 C 与惯用 C++11

Question

我目前正在尝试学习 C++11 及其花哨的功能。具体来说，我正在寻找高效的通用性。所以我很高兴地用 C++11 编写了一个程序来对输入文件的行进行排序，以测试我的新技能。由于 C++ 编译器的内联和很好的特性，我期望这个小例子有高性能。为了了解我的程序有多快，我使用该函数在 C 中破解了完全相同的程序qsort，因为它是原始 C，因此没有对此函数执行内联，并且我的比较函数被间接调用，并且需要执行两个间接访问char *表示字符串的指针。

事实

然而，我对结果感到非常惊讶，C 似乎比 C++ 快 4 倍。在 8Mb 文件上，我得到以下结果：

$ g++ -O3 -std=c++11 -o sort sort.C
$ time ./sort < huge > /dev/null

real    0m0.415s
user    0m0.397s
sys     0m0.013s

$ cc -O3 -Wall -o sortc sort.c
$ time ./sortc < huge  > /dev/null

real    0m0.104s
user    0m0.097s
sys     0m0.010s

$ wc -l huge
140190 huge

请注意，我尽量做到公平，编译选项相同，我的 C 程序（稍后转储）的行为方式与 C++ 程序相同：输入行的大小没有限制，输入的数量没有限制线。

我还注意到，虽然我的 C 程序malloc几乎为每个输入行调用一次，但 C++ 程序的每个输入行分配比例为 10！

编码

这是我用来比较的两个程序。

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <memory>

int main () {
    typedef std::vector<std::string> svec;
    svec a;
    std::string s;

    for (;;) {
        getline(std::cin, s);
        if (std::cin.eof()) {
            if (s != "")
                a.push_back(std::move(s));
            break;
        }
        a.push_back(std::move(s));
    }
    std::sort(a.begin(), a.end());
    for (std::string &s : a) {
        std::cout << s << "\n";
    }
}

还有我更冗长的 C 版本。

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define BUFSZ 100
size_t getl(char **line, size_t len) {
        char buf[BUFSZ];
        size_t i, n;

        for (i=0; i<BUFSZ; i++) {
                int c = getchar();

                if (c == EOF || c == '\n') {
                        *line = malloc(len+i+1);
                        memcpy(&(*line)[len], buf, i);
                        (*line)[len+i] = 0;
                        return i;
                }
                buf[i] = c;
        }

        n = getl(line, len+i);
        memcpy(&(*line)[len], buf, i);
        return i+n;
}

#define ARRAYSZ 30
struct Array {
        char **lv;
        size_t li, lc;
};

void addline(struct Array *a, char *line) {
        if (a->li == a->lc) {
                a->lc *= 2;
                a->lv = realloc(a->lv, a->lc * sizeof *a->lv);
        }
        a->lv[a->li++] = line;
}

int cmp(const void *a, const void *b) {
        return strcmp(*(const char **)a, *(const char **)b);
}

int main(void) {
        char *line;
        struct Array a;
        size_t i;

        a.li = 0;
        a.lc = ARRAYSZ;
        a.lv = malloc(a.lc * sizeof *a.lv);

        for (;;) {
                getl(&line, 0);
                if (feof(stdin)) {
                        if (line[0] != 0)
                                addline(&a, line);
                        else
                                free(line);
                        break;
                }
                addline(&a, line);
        }
        qsort(a.lv, a.li, sizeof *a.lv, cmp);
        for (i=0; i<a.li; i++) {
                printf("%s\n", a.lv[i]);
                free(a.lv[i]);
        }
        free(a.lv);
        return 0;
}

问题

有人能告诉我必须在哪里更改我的 C++ 程序（而不是变成纯 C）以更快吗？我试图保持非常地道，这是破解 C++ 的好方法，还是当我想要高性能时应该倾向于编写类似 C 的代码？为什么 C++ 程序在堆上分配那么多，我怎样才能减少这个？

编辑

根据大众的需求，我展示了我的 C++ 程序的分析结果。这是我的 C++ 程序的分析器的有趣输出（前两行）：

Each sample counts as 0.01 seconds.
 %   cumulative   self              self     total           
time   seconds   seconds    calls  ms/call  ms/call  name    
40.03      0.02     0.02  1198484     0.00     0.00  __gnu_cxx::__normal_iterator<std::string*, std::vector<std::string, std::allocator<std::string> > >::operator--()
30.02      0.04     0.02  2206802     0.00     0.00  bool std::operator< <char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)

当我读到它时，似乎分配不是唯一的原因。

score 27 · Accepted Answer

原因在于 c++ std io 同步。以下代码：

int main () {
    typedef std::vector<std::string> svec;
    svec a;
    std::string s;

    // note
    std::ios_base::sync_with_stdio(false);

    for (;;) {
    getline(std::cin, s);
    if (std::cin.eof()) {
        if (s != "")
            a.push_back(std::move(s));
        break;
    }
        a.push_back(std::move(s));
    }
    std::sort(a.begin(), a.end());
    for (std::string &s : a) {
        std::cout << s << "\n";
    }
}

给

 real   0m0.106s
 user   0m0.104s
 sys    0m0.004s

C版本给出了这个：

 real   0m0.167s
 user   0m0.164s
 sys    0m0.000s

编辑：正如 RiaD 正确提到sync_with_stdio的当然是静态函数，所以它足以为所有 std io 流调用一次函数。

score 11 · Accepted Answer

您还使用了两个不同的 I/O 库。这将完全搞砸任何时序信息，因为 C 和 C++ I/O 库非常不同。IOstreams 显然不是为速度而设计的。

此外，I/O 是完全不同步的。如果 I/O 数据源只是碰巧有一次进入较慢，则无论排序时间如何，一个程序似乎都会变慢 - 例如，如果操作系统碰巧将它放在缓存中以供一次运行，但在另一次运行时没有.

std::vector<std::string>比如说，您需要计算排序一个预先存在的时间所花费的时间。

哦，是的，你getl的内存泄漏。

score 7 · Accepted Answer

我的猜测是您不测量排序速度，而是测量内存重新分配。不要一次做a.push_back一个元素，而是尝试像使用 C 程序一样预先分配向量内存

a.reserve(num_lines);

根据您的编译器是否使用扩展因子为1.5(VC++) 或2(g++) 的重新分配，您可以使用文件中的行29进行17重新分配140,190（即log(total lines) / log(expansion factor)）。

R. Martinho Fernandes 的评论也一针见血：在两个程序中使用std::chrono::high_resolution_clock::now()围绕sort语句来获得时间差异。这将您与内存和 IO 差异隔离开来。

score 5 · Accepted Answer

其他人已经注意到，您测量的大部分内容是 I/O 库的速度。然而，我认为值得注意的是，与这里所做的一些陈述相反，C++ iostream 可以与使用 CFILE *的 I/O 完全竞争。所以，你测量的主要是“gcc 的 iostreams 有多糟糕？”，而不是“一般来说 iostreams 有多糟糕？”

例如，我首先将我在一个目录中的所有 .txt 文件连接在一起，以创建一个相当大的文本文件。然后，我使用 VC++ 10 编译了您的 C 代码，并使用它对该文件进行排序（将输出写入另一个文件）。运行时间为 3.2 秒。

我还写了我认为合理惯用的 C++ 来完成相同的任务：

#include <iostream>
#include <vector>
#include <algorithm>
#include <string>
#include <iterator>

class line { 
    std::string data;
public:
    friend std::istream &operator>>(std::istream &is, line &l) { 
        return std::getline(is, l.data);
    }
    operator std::string() const { return data; }
};

int main() { 
    std::vector<std::string> vec(
        (std::istream_iterator<line>(std::cin)),
        std::istream_iterator<line>());

    std::sort(vec.begin(), vec.end());
    std::copy(vec.begin(), vec.end(), 
        std::ostream_iterator<std::string>(std::cout, "\n"));

    return 0;
}

这与您的 C++ 代码的长度大致相同，但（我声称）比您的任何一个程序都简单。如果你真的在乎，让它更短一点会很容易。它根本没有特别尝试优化。使用 VC++ 10 编译（与我在 C 代码中使用的优化标志相同-O2b2 -GL），它在 2.8 秒内运行，比 C 代码快大约 10%。

我希望如果你在 Linux 上运行它，你会发现它比你的 C 代码慢。添加这两个sync_with_stdio(false);调用可能会解决这个问题，就像他们对您的 C++ 代码所做的那样。这些sync_with_stdio(false);调用通常在 Linux 上产生相当大的差异，但我从来没有能够衡量在 Windows 上使用它们的任何改进（使用我尝试过的任何编译器——最近的 VC++ 和 MinGW，以及更早之前的英特尔、Comeau 、CygWin 和 Borland 也是如此）。

score 2 · Accepted Answer

除了 I/O 问题（并不是说这些不是真实的），这两种方法做两种不同的事情。C++ 版本移动string对象，C 版本移动指针。后者通常（见下文）会更快。重写 C++ 代码以移动指针而不是对象，即使用 a std::vector<std::string*>，并显式地新建每个string对象。是的，这不是惯用的，但这是一个更公平的速度比较。

如果std::sort是移动感知的，则移动琴弦会变得更快，并且std::vector<std::string>不会受到任何影响。但是移动语义在 C++11 中是新的，即使你有 C++11 编译器，移动语义也可能没有迁移到std::sort.

c++ - 对字符串向量进行排序：普通 C 与惯用 C++11

事实

编码

问题

编辑

5 回答 5

Related

Reference