c++ - 一个跟踪插入顺序的 std::map ？

Question

我目前有一个std::map<std::string,int> 将整数值存储到唯一字符串标识符的方法，并且我确实使用该字符串进行查找。它主要做我想要的，除了它不跟踪插入顺序。因此，当我迭代地图以打印出值时，它们会根据字符串进行排序；但我希望它们根据（第一次）插入的顺序进行排序。

我考虑过使用 avector<pair<string,int>>代替，但我需要查找字符串并将整数值递增约 10,000,000 次，所以我不知道 a 是否std::vector会明显变慢。

有没有办法使用std::map或者是否有其他std容器更适合我的需要？

我在 GCC 3.4 上，我的std::map.

score 62 · Accepted Answer

如果您只有 50 个值，您可以在打印出来之前std::map将它们复制到并使用适当的仿函数进行排序。std::vectorstd::sort

或者你可以使用boost::multi_index。它允许使用多个索引。在您的情况下，它可能如下所示：

struct value_t {
      string s;
      int    i;
};

struct string_tag {};

typedef multi_index_container<
    value_t,
    indexed_by<
        random_access<>, // this index represents insertion order
        hashed_unique< tag<string_tag>, member<value_t, string, &value_t::s> >
    >
> values_t;

score 33 · Accepted Answer

您可以将 astd::vector与 a std::tr1::unordered_map（哈希表）结合使用。这是Boost 文档的链接unordered_map。您可以使用向量来跟踪插入顺序和哈希表来进行频繁查找。如果您正在执行数十万次查找，则 O(log n) 查找std::map和 O(1) 哈希表之间的差异可能很大。

std::vector<std::string> insertOrder;
std::tr1::unordered_map<std::string, long> myTable;

// Initialize the hash table and record insert order.
myTable["foo"] = 0;
insertOrder.push_back("foo");
myTable["bar"] = 0;
insertOrder.push_back("bar");
myTable["baz"] = 0;
insertOrder.push_back("baz");

/* Increment things in myTable 100000 times */

// Print the final results.
for (int i = 0; i < insertOrder.size(); ++i)
{
    const std::string &s = insertOrder[i];
    std::cout << s << ' ' << myTable[s] << '\n';
}

score 17 · Accepted Answer

Tessil 有一个非常好的有序映射（和集合）实现，它是 MIT 许可证。你可以在这里找到它：ordered-map

地图示例

#include <iostream>
#include <string>
#include <cstdlib>
#include "ordered_map.h"

int main() {
tsl::ordered_map<char, int> map = {{'d', 1}, {'a', 2}, {'g', 3}};
map.insert({'b', 4});
map['h'] = 5;
map['e'] = 6;

map.erase('a');


// {d, 1} {g, 3} {b, 4} {h, 5} {e, 6}
for(const auto& key_value : map) {
    std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}


map.unordered_erase('b');

// Break order: {d, 1} {g, 3} {e, 6} {h, 5}
for(const auto& key_value : map) {
    std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
}

score 15 · Accepted Answer

保持平行list<string> insertionOrder。

当需要打印时，迭代列表并查找地图。

each element in insertionOrder  // walks in insertionOrder..
    print map[ element ].second // but lookup is in map

score 4 · Accepted Answer

如果你需要两种查找策略，你最终会得到两个容器。您可以将 avector与实际值 ( ints) 一起使用，并在其map< string, vector< T >::difference_type> 旁边放置 a，将索引返回到向量中。

要完成所有这些，您可以将两者封装在一个类中。

但我相信boost 有一个包含多个索引的容器。

score 3 · Accepted Answer

您想要的（不借助 Boost）是我所说的“有序哈希”，它本质上是哈希和带有字符串或整数键（或两者同时）的链表的混搭。有序散列在迭代期间保持元素的顺序，具有散列的绝对性能。

我一直在整理一个相对较新的 C++ 片段库，它填补了我认为 C++ 库开发人员在 C++ 语言中的漏洞。到这里：

https://github.com/cubiclesoft/cross-platform-cpp

抓住：

templates/detachable_ordered_hash.cpp
templates/detachable_ordered_hash.h
templates/detachable_ordered_hash_util.h

如果将用户控制的数据放入哈希中，您可能还需要：

security/security_csprng.cpp
security/security_csprng.h

调用它：

#include "templates/detachable_ordered_hash.h"
...
// The 47 is the nearest prime to a power of two
// that is close to your data size.
//
// If your brain hurts, just use the lookup table
// in 'detachable_ordered_hash.cpp'.
//
// If you don't care about some minimal memory thrashing,
// just use a value of 3.  It'll auto-resize itself.
int y;
CubicleSoft::OrderedHash<int> TempHash(47);
// If you need a secure hash (many hashes are vulnerable
// to DoS attacks), pass in two randomly selected 64-bit
// integer keys.  Construct with CSPRNG.
// CubicleSoft::OrderedHash<int> TempHash(47, Key1, Key2);
CubicleSoft::OrderedHashNode<int> *Node;
...
// Push() for string keys takes a pointer to the string,
// its length, and the value to store.  The new node is
// pushed onto the end of the linked list and wherever it
// goes in the hash.
y = 80;
TempHash.Push("key1", 5, y++);
TempHash.Push("key22", 6, y++);
TempHash.Push("key3", 5, y++);
// Adding an integer key into the same hash just for kicks.
TempHash.Push(12345, y++);
...
// Finding a node and modifying its value.
Node = TempHash.Find("key1", 5);
Node->Value = y++;
...
Node = TempHash.FirstList();
while (Node != NULL)
{
  if (Node->GetStrKey())  printf("%s => %d\n", Node->GetStrKey(), Node->Value);
  else  printf("%d => %d\n", (int)Node->GetIntKey(), Node->Value);

  Node = Node->NextList();
}

在我的研究阶段，我遇到了这个 SO 线程，看看是否已经存在像 OrderedHash 这样的东西，而不需要我进入一个庞大的库。我很失望。所以我自己写了。现在我已经分享了它。

score 2 · Accepted Answer

另一种实现方式是使用 amap而不是 a vector。我将向您展示这种方法并讨论差异：

只需创建一个在幕后有两个地图的类。

#include <map>
#include <string>

using namespace std;

class SpecialMap {
  // usual stuff...

 private:
  int counter_;
  map<int, string> insertion_order_;
  map<string, int> data_;
};

data_然后，您可以按正确的顺序将迭代器公开给迭代器。您这样做的方式是遍历insertion_order_，并且对于从该迭代中获得的每个元素，data_使用来自的值进行查找insertion_order_

您可以使用更有效hash_map的插入顺序，因为您不关心直接迭代insertion_order_.

要进行插入，您可以使用如下方法：

void SpecialMap::Insert(const string& key, int value) {
  // This may be an over simplification... You ought to check
  // if you are overwriting a value in data_ so that you can update
  // insertion_order_ accordingly
  insertion_order_[counter_++] = key;
  data_[key] = value;
}

有很多方法可以使设计更好并担心性能，但这是一个很好的框架，可以帮助您开始自己实现此功能。您可以将其模板化，并且您实际上可以将对作为值存储在 data_ 中，以便您可以轻松地引用 insert_order_ 中的条目。但我将这些设计问题留作练习:-)。

更新：我想我应该说一下使用 map 与 vector 进行插入顺序的效率

直接查找数据，在这两种情况下都是 O(1)
矢量方法中的插入是 O(1)，地图方法中的插入是 O(logn)
矢量方法中的删除是 O(n)，因为您必须扫描要删除的项目。使用地图方法，它们是 O(logn)。

也许如果你不打算使用删除，你应该使用向量方法。如果您支持不同的排序（如优先级）而不是插入顺序，则映射方法会更好。

score 2 · Accepted Answer

这是只需要标准模板库而不使用boost的多索引的解决方案：
您可以在map中使用std::map<std::string,int>;和vector <data>;在哪里存储向量中数据位置的索引，向量以插入顺序存储数据。这里对数据的访问具有 O(log n) 复杂度。以插入顺序显示数据具有 O(n) 复杂度。数据的插入具有 O(log n) 复杂度。

例如：

#include<iostream>
#include<map>
#include<vector>

struct data{
int value;
std::string s;
}

typedef std::map<std::string,int> MapIndex;//this map stores the index of data stored 
                                           //in VectorData mapped to a string              
typedef std::vector<data> VectorData;//stores the data in insertion order

void display_data_according_insertion_order(VectorData vectorData){
    for(std::vector<data>::iterator it=vectorData.begin();it!=vectorData.end();it++){
        std::cout<<it->value<<it->s<<std::endl;
    }
}
int lookup_string(std::string s,MapIndex mapIndex){
    std::MapIndex::iterator pt=mapIndex.find(s)
    if (pt!=mapIndex.end())return it->second;
    else return -1;//it signifies that key does not exist in map
}
int insert_value(data d,mapIndex,vectorData){
    if(mapIndex.find(d.s)==mapIndex.end()){
        mapIndex.insert(std::make_pair(d.s,vectorData.size()));//as the data is to be
                                                               //inserted at back 
                                                               //therefore index is
                                                               //size of vector before
                                                               //insertion
        vectorData.push_back(d);
        return 1;
    }
    else return 0;//it signifies that insertion of data is failed due to the presence
                  //string in the map and map stores unique keys
}

score 2 · Accepted Answer

你不能用地图做到这一点，但你可以使用两个独立的结构——地图和矢量并保持它们同步——也就是说，当你从地图中删除时，从矢量中找到并删除元素。或者您可以创建一个map<string, pair<int,int>>- 并在您的配对中存储地图的 size() 以记录位置，以及 int 的值，然后在打印时使用位置成员进行排序。

score 1 · Accepted Answer

这与费萨尔斯的回答有些相关。您可以围绕地图和矢量创建一个包装类，并轻松保持它们同步。适当的封装将让您控制访问方法，从而控制使用哪个容器......矢量或地图。这避免了使用 Boost 或类似的东西。

score 1 · Accepted Answer

您需要考虑的一件事是您使用的数据元素数量很少。仅使用向量可能会更快。映射中存在一些开销，这可能导致在小型数据集中进行查找比更简单的向量更昂贵。因此，如果您知道您将始终使用大约相同数量的元素，请进行一些基准测试，看看地图和矢量的性能是否是您真正认为的那样。您可能会发现只有 50 个元素的向量中的查找与地图几乎相同。

score 1 · Accepted Answer

// 应该像这个人！

// 这样保持插入的复杂度是O(logN)，删除的复杂度也是O(logN)。

class SpecialMap {
private:
  int counter_;
  map<int, string> insertion_order_;
  map<string, int> insertion_order_reverse_look_up; // <- for fast delete
  map<string, Data> data_;
};

score 0 · Accepted Answer

无需使用单独的std::vector或任何其他容器来跟踪插入顺序。你可以做你想做的，如下所示。如果您想保留广告订单，则可以使用以下程序（版本 1）：

版本 1std::map<std::string,int> ：用于按插入顺序计算唯一字符串

#include <iostream>
#include <map>
#include <sstream>
int findExactMatchIndex(const std::string &totalString, const std::string &toBeSearched)
{
    std::istringstream ss(totalString);
    std::string word;
    std::size_t index = 0;
    while(ss >> word)
    {
        if(word == toBeSearched)
        {
            return index;
        }
        ++index;
    }
    return -1;//return -1 when the string to be searched is not inside the inputString
}
int main() {
    std::string inputString = "this is a string containing my name again and again and again ", word;
   
   //this map maps the std::string to their respective count
    std::map<std::string, int> wordCount;
    
    std::istringstream ss(inputString);
    
    while(ss >> word)
    {
        //std::cout<<"word:"<<word<<std::endl;
    wordCount[word]++;
    }      
  
    std::cout<<"Total unique words are: "<<wordCount.size()<<std::endl;
    
    std::size_t i = 0;
    
    std::istringstream gothroughStream(inputString);
    
    //just go through the inputString(stream) instead of map
    while( gothroughStream >> word)
    {
        int index = findExactMatchIndex(inputString, word);
        
        
        if(index != -1 && (index == i)){
         std::cout << word <<"-" << wordCount.at(word)<<std::endl;
         
        }
        ++i;
    }
   
    return 0;
}

上述程序的输出如下：

Total unique words are: 9
this-1
is-1
a-1
string-1
containing-1
my-1
name-1
again-3
and-2

请注意，在上述程序中，如果您有逗号或任何其他分隔符，则它被视为一个单独的单词。因此，例如，假设您有字符串this is, my name is，那么字符串is,的计数为 1，而字符串is的计数为 1。那是不同的is,。is这是因为计算机不知道我们对单词的定义。

笔记

上面的程序是我对如何在这个嵌套的 for 循环中按顺序输出数组中的字符的答案的修改？以下是第 2 版：

版本 2std::map<char, int> ：用于按插入顺序计算唯一字符

#include <iostream>
#include <map>
int main() {
    std::string inputString;
    std::cout<<"Enter a string: ";
    std::getline(std::cin,inputString);
    //this map maps the char to their respective count
    std::map<char, int> charCount;
    
    for(char &c: inputString)
    {
        charCount[c]++;
    }
    
    std::size_t i = 0;
    //just go through the inputString instead of map
    for(char &c: inputString)
    {
        std::size_t index = inputString.find(c);
        if(index != inputString.npos && (index == i)){
         std::cout << c <<"-" << charCount.at(c)<<std::endl;
         
        }
        ++i;
    }
    return 0;
}

在这两种情况/版本中，无需使用单独的std::vector容器或任何其他容器来跟踪插入顺序。

score -1 · Accepted Answer

-1

boost::multi_index与地图和列表索引一起使用。

于 2013-10-21T09:04:40.320 回答

score -1 · Accepted Answer

在插入调用时递增的对 (str,int) 和静态 int 的映射索引数据对。放入一个可以返回带有 index () 成员的静态 int val 的结构？

c++ - 一个跟踪插入顺序的 std::map ？

15 回答 15

笔记

Related

Reference