python - 在mysql中存储倒排索引

Question

我正在努力创建一个非常大的倒排索引术语。你会建议什么方法？

第一的

termId - > docId
  a        doc2[locations],doc5[locations],doc12[locations] 
  b        doc5[locations],doc7[locations],doc4[locations]

第二

termId - > docId
  a        doc2[locations]
  a        doc5[locations]
  a        doc12[locations]
  b        doc5[locations]
  b        doc7[locations] 
  b        doc4[locations]

ps Lucene 不是一个选项

score 1 · Accepted Answer

正确的表格设计取决于您计划如何使用数据。如果您打算按原样使用字符串"doc2[locations],doc5[locations],doc12[locations]" ——无需任何进一步的后处理，那么您的First设计就可以了。

但是，如果 - 正如您的问题所暗示的那样 - 您有时可能希望将doc2[locations],doc5[locations]等视为单独的实体，那么您绝对应该使用您的Second设计。

以下是一些用例，说明了为什么Second设计更好：

如果你使用First并询问所有文档，termID = a那么你会得到一个字符串 doc2[locations],doc5[locations],doc12[locations]，然后你必须拆分它。

如果您使用 Second，您会将每个文档作为单独的行。没有分裂！

Second结构更方便。
或者，假设在某些时候发生了doc5[locations]变化，您需要更新您的表格。如果您使用该First设计，则必须使用一些相对复杂的MySQL 字符串函数来查找和替换包含它的所有行中的子字符串。（请注意，MySQL 没有内置正则表达式替换。）

如果您使用该Second设计，更新很容易：
```
UPDATE table SET docId = "newdoc5[locations]" where docId = "doc5[locations]"
```

python - 在mysql中存储倒排索引

1 回答 1

Related

Reference