c++ - 提高 std::sort 性能

Question

我有一个包含大约 10k 个项目的索引，这些项目必须按字典顺序不区分大小写。

这是我的方法：

bool lowercomp (AbstractServiceProvider::AbstractItem*  i, AbstractServiceProvider::AbstractItem* j)

{
    std::string a,b;

    // lower first string
    a.resize(i->title().length());
    std::transform(i->title().cbegin(), i->title().cend(), a.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    // lower 2nd string
    b.resize(j->title().length());
    std::transform(j->title().cbegin(), j->title().cend(), b.begin(),
                std::bind2nd(std::ptr_fun(&std::tolower<char>), std::locale("")));

    return 0 > a.compare(b);
}

在我的代码中某处：

t = new boost::timer::auto_cpu_timer;
std::sort(_index.begin(), _index.end(), lowercomp);
delete t;

但这需要大约 4 秒。如果没有 toLower 部分，大约需要 0.003 秒。有没有办法改善这一点？

score 4 · Accepted Answer

你绝对可以让它更快。解决方案是避免分配内存，而是通过在进行比较时使用 tolower() 一次转换一个字符来不区分大小写地比较字符串。在比较函数中不应该构造类对象。像这样的东西：

bool lowercomp(const AbstractItem* lhs, const AbstractItem* rhs)  
{
    size_t size = std::min(lhs->title().size(), rhs->title().size());
    for (size_t pos = 0; pos < size; ++pos) {
        if (tolower(lhs->title()[pos]) < tolower(rhs->title()[pos]) {
            return true;
        } else if (tolower(lhs->title()[pos]) > tolower(rhs->title()[pos]) {
            return false;
        }
    }
    return lhs->title().size() < rhs->title().size();
}

让我们知道那有多快。:)

score 4 · Accepted Answer

在您看到分析器输出之前，要知道减速在哪里，您无法确定，但有许多点似乎可能导致我减速。最重要的两个是：

您的函数在每次调用时创建两个新字符串。这可能非常昂贵，而且
std::tolower您使用;的两个操作数形式每次调用此函数时都必须提取ctypefacet（并且每次调用lowercomp.

我自己的偏好是使用功能对象进行比较。使用一些编译器，它更快，但在这种情况下，它也更干净：

class CaseInsensitiveCompare
{
    std::locale myLocale;   //  To ensure lifetime of the facet.
    std::ctype<char> const& myCType;
public:
    CaseInsensitiveCompare( std::locale const& locale = std::locale( "" ) )
        : myLocale( locale )
        , myCType( std::use_facet<std::ctype<char>>( myLocal ) )
    {
    }
    bool operator()( AbstractItem const* lhs, AbstractItem const* rhs ) const
    {
        return (*this)( lhs->title(), rhs->title() );
    }
    bool operator()( std::string const& lhs, std::string const& rhs ) const
    {
        return std::lexicographical_compare(
            lhs.begin(), lhs.end(),
            rhs.begin(), rhs.end(),
            *this);
    }
    bool operator()( char lhs, char rhs ) const
    {
        return myCType.tolower(lhs) < myCType.tolower(rhs);
    }
};

除此之外，还有其他几点可能会提高性能：

如果您确定locale您正在使用的生命周期（并且您通常可以），请将该myLocale成员放入班级；复制语言环境将是复制此类实例最昂贵的部分（并且调用 lexicographical_compare将至少复制一次）。
如果您不需要本地化功能，请考虑使用 in 中的tolower功能<cctype>，而不是 in 中的功能<locale>。这将完全避免在比较中需要任何数据成员。
最后，虽然我不确定对于小到 10K 的项目是否值得，但您可以考虑使用字符串的规范形式（已经小写等）制作向量，仅使用<字符串对它们进行排序，然后据此重新排序原始向量。

另外，我非常怀疑`new boost::timer::auto_cpu_timer'。你真的需要在这里动态分配吗？顺便说一下，我怀疑局部变量会更合适。

score 0 · Accepted Answer

你的实施让我觉得非常低效。我看到几个问题。

您正在对排序比较器中的两个tolower字符串执行 a 。由于此比较器将按时间顺序调用，因此您将成为两个字符串，每个字符串大约 40K 次（？）。n log ntolowering

我根本不想比较字符串。字符串比较不仅比其他方法（例如积分比较）效率低几个数量级，而且还容易出错并且需要您清理数据——另一个低效率的来源。

但是，如果您必须比较字符串，请在执行排序之前清理它们。这包括tolower他们。理想情况下，在元素构造时擦洗数据。除此之外，您甚至可以在打电话之前擦洗它sort。无论您做什么，都不要在比较器内擦洗它。

c++ - 提高 std::sort 性能

3 回答 3

Related

Reference