keyword - 如何为 MarkLogic 中存储的文档生成关键字？

Question

我需要为加载到 MarkLogic 的一组文档中的每个文档生成一个关键字列表。我正在考虑针对一组文档运行 cts:distinctive-terms，但无法弄清楚如何获取每个文档的关键字列表，而不是与该集合相关的术语列表。任何人都可以提出解决方案吗？

score 3 · Accepted Answer

您是否使用了该score=logtf选项？当我尝试这样做时，停用词的分数上升了很多。如果您考虑一下，这是有道理的：数据库不能再使用 IDF 来清除它们。但是，如果您只想要 TF，您可以使用停用词列表进行过滤 - 正如已经建议的那样。

但是logtfidf得分自然应该惩罚停用词。您可以设置min-val选项或其他选项来调整结果。例如，这里我设置min-val为 27，因为停用词开始出现在 26。正确的选项将取决于现有的数据库内容，因为 IDF。

cts:distinctive-terms(
  text { 'I need to generate a list of keywords for each document in a set of documents that are loaded into MarkLogic. I am considering running cts:distinctive-terms against the set of documents, but cannot figure out how to get a list of keywords for each document rather than a list of terms relevant to the set. Can anyone suggest a solution?' },
  <options xmlns="cts:distinctive-terms"
   xmlns:db="http://marklogic.com/xdmp/database">
    <min-val>27</min-val>
    <use-db-config>false</use-db-config>
    <db:stemmed-searches>true</db:stemmed-searches>
    <db:word-searches>false</db:word-searches>
    <db:fast-phrase-searches>false</db:fast-phrase-searches>
  </options>)/cts:term/cts:word-query/cts:text/string()
=>
load
set
solution
term
document
list
keyword

score 3 · Accepted Answer

只需遍历感兴趣的文档并分别为每个文档调用 cts:distinct-terms ：

for $doc in doc()
return
    cts:distinctive-terms($doc)

！

keyword - 如何为 MarkLogic 中存储的文档生成关键字？

2 回答 2

Related

Reference