database-design - 大型持久类阵列结构

Question

您将使用哪种数据结构来实现类似结构的大型持久数组（大小 > 10 个 Mio 元素）

数据结构的接口应该支持以下操作：

YType get(IdxType idx) // random access  
void insertIdx(IdxType idx, YType yValue) // random insert   
void deleteIdx(IdxType idx) // random delete

让索引为标量无符号值，IdxType例如标量值或结构。unsigned long longYType

这些操作的复杂性永远不应大于 O(log n)，如果随机访问的复杂性在一段时间后下降到 O(1)，这将非常有效，因为对于许多用例来说，读取操作将被使用得最多。

这些要求排除了可以写入磁盘的内存数据结构（例如向量或列表），因为向量的插入复杂度是 O(n)，而列表的随机访问也是 O(n)。

编辑：
请注意，带有索引键的哈希映射或树状数据结构不符合要求。某个索引的值可以更改。即，当为索引插入一个值时，所有后续索引都会更改它们的值。这正是插入元素时数组中后续索引发生的情况。

score 0 · Accepted Answer

You might want to use an order statistic tree, a tree data structure that supports O(log n) insertions, deletions, and lookups. Typically, order statistic trees are used to store sorted data, though it is relatively straightforward to modify them to store an unsorted sequence.

Intuitively, an order statistic tree is a binary tree where each node stores the number of children in its left subtree. That way, you can recursively look up the nth element as follows:

If the tree is null, there is no element.
If n is less than or equal to the number of elements in the left subtree (call it I), recursively look up the element in the left subtree.
If n = k + 1, the root of the tree is the nth element.
Otherwise, look up the n - k - 1st element in the right subtree.

To do an insertion before node n, use this same procedure to find the nth node and insert the new value as that nodes inorder predecessor (go left, then keep going right until you walk off the tree and insert the node there). Then walk back up the tree from nth element upward and adjust the value of k for all nodes on the insertion path that need to be updated.

To prevent the tree from getting too imbalance, you can use any standard tree balancing scheme (AVL, red/black, etc.) to keep the height at O(log n). This gives O(log n) performance for all the operations on the data structure.

Hope this helps!

score 0 · Accepted Answer

取决于您是否需要同时访问它...

不

如果您只需要一个可以在任何给定时间访问它的客户端应用程序，请使用内存中的结构，例如rope（或者甚至只是使用“数组”索引作为键的哈希表），这可以在访问和修改之间提供良好的平衡复杂。完成后，将其序列化为一个简单文件。如今，RAM 既便宜又丰富，除非每个元素都非常大，否则 1000 万个元素应该可以放入其中。

是的

如果您需要并发访问、事务等，请使用数据库。您的“数组”实际上是一个 B 树，其键是数组索引。如果您的 DBMS 支持，请使用集群来消除“不必要的”表堆并将所有内容保留在 B-Tree 中。像这样的东西：

CREATE TABLE THE_ARRAY (
    INDEX INT PRIMARY KEY,
    DATA VARCHAR(50) -- Whatever...
) ORGRANIZATION INDEX

（ORGRANIZATION INDEX 是 Oracle 特定的语法，用于其他 DBMS 中所谓的“集群”；使用适合您的 DBMS 的任何语法。）

插入将是 O(N)（因为您需要在插入后更新索引）并且查找将是 O(log(N))。此外，一些 DBMS 支持哈希索引，这应该允许 O(1) 查找，但对于插入来说可能比 B-Tree 更糟糕。

database-design - 大型持久类阵列结构

2 回答 2

不

是的

Related

Reference