c++ - 将异构数据作为字符序列连续存储在内存中

Question

我有大量的字符串和一些与每个字符串相关的数据。为简单起见，我们假设数据是int每个字符串的一个。假设我有一个std::vector<std::tuple<std::string, int>>. 我想尝试使用单个堆分配将这些数据连续存储在内存中。我以后不必担心添加或删除字符串。

一个简单的例子

构造 anstd::string需要堆分配，而访问chars的条目std::string需要取消引用。如果我有一堆字符串，我可以通过将所有字符串存储在一个中std::string并将每个字符串的起始索引和大小存储为单独的变量来更好地利用内存。如果我愿意，我可以尝试将起始索引和大小存储在std::string自身中。

回到我的问题

我的一个想法是将所有内容存储在std::stringorstd::vector<char>中。的每个条目都std::vector<std::tuple<std::string, int>>将像这样在内存中排列：

下一个字符串的长度（int或size_t）
chars表示字符串 ( )的字符序列
一些数字零字符用于正确int对齐（chars）
数据 ( int)

这需要能够将chars 序列解释为int. 以前有过关于此的问题，但在我看来，尝试这样做可能会导致未定义的行为。我相信我可以通过检查sizeof(int).

我的另一个选择是创建一个工会

union CharInt{
    char[sizeof(int)] some_chars;
    int data;
}

在这里，我需要注意char每次使用的 s数量int是在编译时根据sizeof(int). 然后我会存储一个std::vector<CharInt>. 这似乎比使用更“C++” reinterpret_cast。这样做的一个缺点是访问 a 的第二个char成员CharInt需要额外的指针添加（指向CharInt+ 1 的指针）。相对于使所有内容都连续的好处，这个成本似乎仍然很小。

这是更好的选择吗？还有其他选择吗？我需要考虑使用该union方法的陷阱吗？

编辑：

我想说明如何CharInt使用。我在下面提供了一个示例：

#include <iostream>
#include <string>
#include <vector>


class CharIntTest {
public:
    CharIntTest() {
        my_trie.push_back(CharInt{ 42 });
        std::string example_string{ "this is a long string" };
        my_trie.push_back(CharInt{ example_string, 5 });
        my_trie.push_back(CharInt{ 106 });
    }

    int GetFirstInt() {
        return my_trie[0].an_int;
    }

    char GetFirstChar() {
        return my_trie[1].some_chars[0];
    }

    char GetSecondChar() {
        return my_trie[1].some_chars[1];
    }

    int GetSecondInt() {
        return my_trie[2].an_int;
    }

private:

    union CharInt {
        // here I would need to be careful that I only insert sizeof(int) number of chars
        CharInt(std::string s, int index) : some_chars{ s[index], s[index+1], s[index+2], s[index+3]} {
        }

        CharInt(int i) : an_int{ i } {
        }

        char some_chars[sizeof(int)];
        int an_int;
    };

    std::vector<CharInt> my_trie;

};

请注意，我不会像访问第一个或第三个CharInts 一样访问它们char。我不访问第二个CharInt，就好像它是一个int. 这是main：

int main() {
    CharIntTest tester{};

    std::cout << tester.GetFirstInt() << "\n";
    std::cout << tester.GetFirstChar() << "\n";
    std::cout << tester.GetSecondChar() << "\n";
    std::cout << tester.GetSecondInt();
}

产生所需的输出

c++ - 将异构数据作为字符序列连续存储在内存中

0 回答 0

Related

Reference