我有一个非常大的 4.5M 文档数据库。使用默认查询解析器时,我要查找的文档会按原样出现在结果中。
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"\"I predict a riot\"",
"rows":"1"}},
"response":{
"numFound":15,"start":0,"docs":[
{
"artist":"Kaiser Chiefs",
"text":"<p>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>And not very sensible either<br>A friend of a friend he got beaten<br>He looked the wrong way at a policeman<br>Would never have happened to Smeaton<br>An old Leodiensian<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>Oh, I try to get to my taxi<br>A man in a tracksuit attacks me<br>He said that he saw it before me<br>Wants to get things a bit gory<br>Girls scrabble round with no clothes on<br>To borrow a pound for a condom<br>If it wasn't for chip fat, they'd be frozen<br>They're not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>Ow!<br><br>Oh, watchin' the people get lairy<br>It's not very pretty, I tell thee<br>Walkin' through town is quite scary<br>Not very sensible<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot<br><br>And if there's anybody left in here<br>That doesn't want to be out there<br><br>I predict a riot, I predict a riot<br>I predict a riot, I predict a riot</p>",
"_ts":6341730138387906561,
"title":"I predict a riot",
"id":"redacted"}]
}}
但是,当我使用所有附加参数切换到 DisMax 查询处理程序时,这就是我得到的:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "\"I predict a riot\"",
"defType": "dismax",
"ps": "0",
"qf": "text",
"echoParams": "all",
"pf": "text^5",
"wt": "json"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
没什么...如果我删除引号,它会发现一些非常不相关的结果(一位名为“I”的艺术家的歌曲)。如果不清楚“我预测发生骚乱”是否存在于本文档的文本字段中。甚至几次。
我是 Solr 新手,我不明白这个查询有什么问题。我尝试将 qf 和 pf 更改为“艺术家文本标题”,但没有。
理想情况下,目标是在所有三个字段中找到匹配项,如果在标题、艺术家或文本中以相同的顺序找到所有单词,则可以获得巨大的奖励。但即使是这个简单的测试似乎也不起作用。:-/
谢谢!
编辑:使用这些参数
"params": {
"q": "I predict a riot",
"defType": "dismax",
"qf": "text artist title",
"echoParams": "all",
"pf": "text^5",
"rows": "100",
"wt": "json"
}
这给了我这个调试查询:
"debug": {
"rawquerystring": "I predict a riot",
"querystring": "I predict a riot",
"parsedquery": "(+(DisjunctionMaxQuery((text:I | title:I | artist:I)) DisjunctionMaxQuery((text:predict | title:predict | artist:predict)) DisjunctionMaxQuery((text:a | title:a | artist:a)) DisjunctionMaxQuery((text:riot | title:riot | artist:riot))) DisjunctionMaxQuery(((text:I predict a riot)^5.0)))/no_coord",
"parsedquery_toString": "+((text:I | title:I | artist:I) (text:predict | title:predict | artist:predict) (text:a | title:a | artist:a) (text:riot | title:riot | artist:riot)) ((text:I predict a riot)^5.0)",
"QParser": "DisMaxQParser",
"altquerystring": null,
"boostfuncs": null
}
我得到了糟糕的结果,即一个叫“我”的艺术家——但不是凯撒酋长的歌曲,它的标题和文本中有几次查询。
定义:
<field name="title" type="string" indexed="true" stored="true"/>
<field name="artist" type="string" indexed="true" stored="true"/>
<field name="text" type="string" indexed="true" stored="true"/>