1

给定一系列包含文本的文档,我想搜索短语并返回所有匹配项并对它们进行排名。我知道如何获取 lucene/solr 以指示哪些文档匹配,并在文档中突出显示,但是如何获得包含来自同一文档的多个匹配项的排名?

First document.  It has a single line of text.
Second document.  This text line is quite short.
This is another line containing more text and is a bit longer.

如果我搜索“文本行”,那么我希望它找到三个匹配项,排名如下:

2nd document -> ...This "text line" is quite short.
1st document -> ...It has a single "line of text".
2nd document -> ...another "line containing more text" and is...

这可能吗?如何?

4

1 回答 1

-1

如果您想每行匹配一个,则将每一行作为自己的文档。不要让术语“文档”与文本是否实际上是单个文件相混淆。

如果您想维护返回文件的链接,只需在不同的(存储的)字段中索引 id 即可。

{ id: "myfile.txt",
  text: "first line" }

{ id: "myfile.txt",
  text: "second line" }
于 2012-01-17T19:14:09.190 回答