1

我有狮身人面像索引schools,当我进行查询时,我总是以相同的顺序收到相同的结果。我已经尝试了所有可以想象的排名、排序和匹配组合,并且总是得到相同的排序。

我得到的不良数据示例如下:

"albany high"

Albany Junior High School | Auckland, NZ | 2001 (shouldn't be first)
Albany High School        | Albany, NY   | 2001
South Albany High School  | Albany, OR   | 2001
Albany High School        | Albany, CA   | 1001 (shouldn't be last)

如您所见,排名最高的学校不在名为“Albany”的城市,应该更低,而排名最低的“Albany High School”应该比它的排名更高。这个问题在许多搜索词中重复出现。

Sphinx 索引如下所示:

source schools : root
{
    sql_query = \
        SELECT schools.id, schools.name, schools.state, schools.country, schools.city, \
        (select COUNT(*) from user2school WHERE school_id = schools.id) as user_count \
        FROM schools

    sql_attr_uint       = user_count
}

index schools
{
    source                  = schools
    path                    = /var/db/sphinx/data/schools
    min_infix_len           = 3
    infix_fields            = name
}

生成结果的代码如下:

$sphinx->SetMatchMode(SPH_MATCH_EXTENDED);
$sphinx->SetRankingMode(SPH_RANK_WORDCOUNT);
$sphinx->SetSortMode(SPH_SORT_RELEVANCE);

$sphinx->SetFieldWeights(array(
    'id' => 0,
    'name' => 1000,
    'city' => 0,
    'state' => 0,
    'user_count' => 0
));

如何让 Sphinx 识别我的自定义权重?我尝试过的每一种组合似乎都失败了。


编辑:

这是另一个具有相同顺序但设置完全不同的示例。我在这里打开的唯一选项是:

$sphinx->SetRankingMode(SPH_RANK_SPH04);

结果:

"albany high"

Albany Junior High School | Auckland, NZ | 3 (still shouldn't be first)
Albany High School        | Albany, NY   | 3
South Albany High School  | Albany, OR   | 2
Albany High School        | Albany, CA   | 1 (still shouldn't be last)

如您所见,顺序是相同的。在我尝试过的排名、排序和加权的每种组合中,它都是相同的。有什么我可以尝试调试这个问题的吗?

4

2 回答 2

1

也许它是您的应用程序中的逻辑错误。Sphinx 为您提供了一个 ID 列表,然后您可以使用它从原始数据库中检索数据。也许您没有正确排序这些行。

我刚刚尝试将您的数据插入测试 RT 索引(包括字符串属性,因此可以看到数据)

mysql> insert into rttest values (1,'Albany Junior High School','Auckland','NZ','Albany Junior High School, Auckland, NZ');
   ... etc ...

mysql> select * from rttest where match('albany high');
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |   3267 | Albany High School, Albany, NY          |
|    3 |   3267 | South Albany High School, Albany, OR    |
|    4 |   3267 | Albany High School, Albany, CA          |
|    1 |   1304 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.15 sec)

mysql> select * from rttest where match('albany high') option ranker=sph04;
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |  12267 | Albany High School, Albany, NY          |
|    4 |  12267 | Albany High School, Albany, CA          |
|    3 |  10267 | South Albany High School, Albany, OR    |
|    1 |   6304 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.00 sec)

mysql> select * from rttest where match('albany high') option ranker=wordcount;
+------+--------+-----------------------------------------+
| id   | weight | value                                   |
+------+--------+-----------------------------------------+
|    2 |      3 | Albany High School, Albany, NY          |
|    3 |      3 | South Albany High School, Albany, OR    |
|    4 |      3 | Albany High School, Albany, CA          |
|    1 |      2 | Albany Junior High School, Auckland, NZ |
+------+--------+-----------------------------------------+
4 rows in set (0.00 sec)

更改排名模式确实有效。

于 2012-07-19T11:07:22.790 回答
0

SetFieldWeights 中的 0 看起来很奇怪。要么只记下要设置权重的字段,要么使用 1 作为默认值。我怀疑 0 会导致问题。

怀疑 SPH_RANK_SPH04 最适合这种特殊情况。

也不应该需要你的 setSelect

于 2012-07-17T11:11:10.023 回答