2

我引用东京内阁文件...

对于哈希表的数据库,每个键在一个数据库中必须是唯一的,因此不可能存储两个或多个键重叠的记录。

还是 tokyocabinet 允许基于元组的键?

设置一对多存储的最佳方法是什么(例如爬虫 1 kw<->many docids )

~B

4

2 回答 2

1

Using the table database (TDB), you can simply store a list of keys in one value as tokens. As long as your keys are valid "tokens", you can easily list them this way in a single field.

Here's an example using Pyrant's low-level interface:

>>> from pyrant import Tyrant
>>> t = Tyrant()
>>> includes = 5  # code for operation TDBQCSTROR
>>> t['test'] = {'foo': 'abc,def', 'bar': 'abc def', 'quux': 'abcdef'}
>>> t.proto.search([('foo',includes,'abc')])
[u'test']
>>> t.proto.search([('bar',includes,'abc')])
[u'test']
>>> t.proto.search([('quux',includes,'abc')])
[]
>>> t.proto.search([('quux',includes,'abcd')])
[]
>>> t.proto.search([('quux',includes,'abcdef')])
[u'test']

TDBQCSTROR is an operation type which stands for "string includes at least one token in..." (see "tctdbqryaddcond" in Tokyo Cabinet API specs).

Note that both "abc,def" and "abc def" matched the "abc" keyword, but "abcdef" didn't, despite "abc" is actually subset of "abcdef". This can be used to search keys stored in a single string, e.g.:

t['tokyocabinet'] = {'title': 'Tokyo Cabinet'}
t['primary-key'] = {'title': 'Primary Key'}
t['question1228313'] = {
    'title': 'how to build one to many rows in tokyo cabinet?',
    'tags': 'tokyocabinet, primary-key',
}

(Tags are probably not the best example as they don't need to be references.)

If you are using a TC database of another kind (not TDB), I cannot imagine a valid solution. You may want to ask this question in the related discussion group.

于 2009-08-23T05:09:42.070 回答
0

对于哈希表的数据库,每个键在一个数据库中必须是唯一的,因此不可能存储两个或多个键重叠的记录。

B+ -Tree Tokyo Cabinet 数据库允许重复键:

bool tcbdbputdup(TCBDB *bdb, const void *kbuf, int ksiz, const void *vbuf, int vsiz); 

使用 Ruby API:

TokyoCabinet::BDB.putdup(key, value) -> true|false
TokyoCabinet::BDB.getlist(key) => [value, ...]|nil
于 2010-06-02T12:21:31.270 回答