1

我想在我的 ML 数据库中创建一个新的元素范围索引。我如何估计这个新索引的大小?我正在使用 ML 8.0-3.2。

4

2 回答 2

2

最好的办法是对具有代表性的数据样本进行测试,然后进行推断。

字符串索引在一个立场内共享唯一值和唯一标记,因此大小将高度依赖于不同值的数量,并且很难预先计算。

对于其他数据类型,大小取决于内容中实际值的数量。如果您知道每个文档和 N 个文档平均有 k 个值,那么如果您打开了位置,您会期望8*N*k字节或字节。16*N*k浮动索引是这个大小的一半;如果使用双精度,点索引是双精度的。

于 2018-12-07T15:52:04.770 回答
0

Key data is stored in MARKLOGIC_DATA_DIR (depends on your install) in the sub directory Forests/<Forest Name>/ along with the non-key data. The key and non-key data are dependent. If your intent is to estimate how much more disk space it will take if you add a new index, take the size of all the forests directories for your Database without that index, then add the index, and subtract.

Yes I know that doesn't sound much like 'estimate'. Anything else is a rough guess.

For a 'rough guess' -- 'it depends' -- and any guess should be normalized by trying it. Basically a typical text index size corresponds to the number of distinct terms * 8 * num-docs-that-have-that-term.

Each index entry will contain at least one 64 bit value for each document containing that term. In addition it will (possibly sharing with other indexes) store an encoded version of that term.

This 'rough guess' may be off by 10x or more -- depending on the kind of index and distribution of data, compression and encryption etc. Hence, you should really compare before & after indexing similar indexes.

于 2018-12-07T15:55:32.463 回答