data-structures - Ordered string container with constant insertion time?

Question

What I need is an ordered, associative container with string keys, valued by a vector of numbers. Further, I need O(1) insertion time.

My description sounds abstract, I shall give you a scenario:

There is an online test. When a person takes this test his name is added into a database. People may take the test repeatedly if they like. All their scores will be recorded under their name (which is unique). For example, David, Tom, Alice come and take the exam a few times. The program should be able to print out the output in following format:

Alice 65 70 84
David 98 97 93
Tom   100 45 
...

正如你所看到的，他们的名字应该按字典顺序打印出来。每当有人参加考试时，他的分数就会被添加到数据库中。由于很多人会来参加考试，这一定是O(1)时间复杂度。但是打印出数据库也经常发生；每秒说一次。因此，不必在每个显示器上显式排序将是有利的。

我可以在这里使用什么数据结构？（首选 STL。）我目前正在使用，unordered_map因为它给了我O(1)插入，但它不能按字典顺序迭代键。

score 4 · Accepted Answer

O(1)可以插入并给出有序迭代的容器O(n)可用于在线性时间内对字符串进行排序。

我们可以立即得出结论，这个容器不能单独使用比较器，因为基于比较器的排序的下限为O(n log n).

只有少数排序算法可以按线性时间排序，它们通常需要了解您的关键空间才能工作。这种算法的增量“在线”版本可能对您有用，然后它使用的增量构建的内部数据将成为您的容器。

这是对线性时间排序算法的讨论。

score 1 · Accepted Answer

考虑到在这种情况下真正的O(1)操作是一个非常复杂的问题，并且您更喜欢使用 STL，我建议使用以下数据结构。并不是说这不能满足您的 Big-Oh 要求，即使使用 STL，这也不是实现它的最有效方式，但它简单而高效。

您的主要数据结构是std::map. 但是为了加速查找到 O(1)，你可以使用std::unordered_map这样的：

using std::map, std::string, std::vector, std::unordered_map;

typedef map<string, vector<int> > TestTakersMap;
typedef unordered_map<string, TestTakersMap::iterator> LookupAccelerator;

现在您的各种操作将是：

添加一个新人：您插入到地图中，但您也在地图中添加名称和迭代器，您将新记录插入到 unordered_map 中。O(log(N))。
查找一个人：使用 unordered_map，您可以获得该人的迭代器和数据。预期 O(1)。
添加新分数：您使用 unordered_map 找到该人，然后使用您获得的迭代器附加新分数。摊销 O(1)。
打印出所有名称和分数：您迭代地图本身并按字典顺序获取名称，无需添加排序步骤。可以认为是O(S)，其中S是所有参与者的得分总数。

请注意，在所有这一切中，您的瓶颈将是您的缓存，所有这些在内存中追逐和跳跃的指针对您毫无帮助。当然，这取决于其他几个因素，例如，您实际获得了多少名字，每个人有多少分数，添加新人的频率，添加新分数的频率，打印所有姓名和分数，每个人和每个测试您拥有和需要多少数据等。

更新：您可以执行如下基本操作。包含等如下所示：

#include <map>
#include <string>
#include <unordered_map>
#include <vector>

using std::map;
using std::string;
using std::unordered_map;
using std::vector;

这是一个非常简单的类来做一些你想要的操作。请注意，我使用的是 C++11 功能（auto, emplace, ...），但不要将此代码视为特别好的编程风格；我不能保证这一点。

class TestScores
{
private:
    typedef int ScoreType;
    typedef vector<ScoreType> ScoreList;
    typedef map<string, ScoreList> TestTakersMap;
    typedef unordered_map<string, TestTakersMap::iterator> LookupAccelerator;

public:
    bool hasName (string const & new_name) const
    {
        return m_lookup.end() != m_lookup.find (new_name);
    }

    // Returns true if the name is really new
    bool addName (string const & new_name)
    {
        if (hasName(new_name))
            return false; // name already in there

        auto i = m_takers.emplace (new_name, vector<int>()).first;
        m_lookup.emplace (new_name, i);

        return true;
    }

    ScoreList const & getScores (string const & name) const
    {
        // This redirects to the private, non-const version
        return const_cast<TestScores *>(this)->getScores(name);
    }

    void addScore (string const & name, ScoreType new_score)
    {
        getScores(name).push_back (new_score);
    }

private:
    // If the name doesn't already exist, it is added!
    ScoreList & getScores (string const & name)
    {
        if (!hasName(name))
            addName (name);

        return m_lookup[name]->second;
    }

private:
    TestTakersMap m_takers;
    LookupAccelerator m_lookup;
};

score 0 · Accepted Answer

If you are really serious about the size of your data-set being very large, and that you absolutely need efficient insertion, lookup and lexicographical iteration, you can check out Judy Arrays. Judy arrays are fast, memory-efficient and trie-like associative data structures.

You can check out these two implementations:

A larger implementation from HP; seems to be under the LGPL.
A smaller implementation on Google Code; under the new BSD license.

data-structures - Ordered string container with constant insertion time?

3 回答 3

Related

Reference