0

好的,所以我有一个很好的小查询,可以返回评分结果。该查询目前是LIKE基于的,我想将其转换为全文查询,因为每个人都 一直告诉我这样做。如果分数不同,我想获得相同的结果顺序。我能够得到任何接近的唯一方法是展开我的交叉连接......

  • 我希望能够为特定的单词组合设置分数
  • 我希望能够根据找到该术语的位置设置权重
  • 我不希望基于搜索中单词的Power Set进行搜索。也就是说,如果用户输入“铁路员工”,我不想在任何时候搜索“员工”。我正在尝试仅从查询中搜索连续的术语分组。

我怎样才能使我的原始查询基于全文并且仍然保持它相对较小和有条理?

您可以在SQLFiddle上看到这两个查询。

原始查询- 又好又小,分数和搜索词都在一个地方

SELECT
  sum(score * multiplier) score,
  a.id,
  a.title
FROM
(
  SELECT 3 score, 'a railway employee' term UNION ALL
  SELECT 2 score, 'railway employee' term UNION ALL
  SELECT 2 score, 'a railway' term UNION ALL
  SELECT 1 score, 'employee' term UNION ALL
  SELECT 1 score, 'railway' term UNION ALL
  SELECT 0 score, 'a' term
) terms
CROSS JOIN
(
  SELECT 'T' TYPE, 1 multiplier
  UNION ALL SELECT 'S', 1.1
  UNION ALL SELECT 'C', 1.5
) x
INNER JOIN
(
  SELECT id, 'T' TYPE, title SEARCH FROM articles
  UNION ALL
  SELECT id, 'S' TYPE, summary SEARCH FROM articles WHERE summary <> ''
  UNION ALL
  SELECT artId, 'C' TYPE, content SEARCH FROM articleSections
) s ON s.TYPE = x.TYPE AND SEARCH LIKE concat('%', terms.term, '%')
INNER JOIN articles a ON a.id = s.id
WHERE score > 0
GROUP BY id, title
ORDER BY score DESC, title;
;

全文- 凌乱而大,分数和搜索词到处都是

SELECT
  sum(score * multiplier) score,
  id,
  title
FROM
(
SELECT
  3 score,
  1 multiplier,
  'T' AS loc,
  id,
  title
FROM articles
WHERE MATCH(title) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1 multiplier,
  'T' AS loc,
  id,
  title
FROM articles
WHERE MATCH(title) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1 multiplier,
  'T' AS loc,
  id,
  title
FROM articles
WHERE MATCH(title) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1 multiplier,
  'T' AS loc,
  id,
  title
FROM articles
WHERE MATCH(title) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1 multiplier,
  'T' AS loc,
  id,
  title
FROM articles
WHERE MATCH(title) AGAINST ('employee' IN BOOLEAN MODE)
UNION ALL


SELECT
  3 score,
  1 multiplier,
  'S' AS loc,
  id,
  title
FROM articles
WHERE MATCH(summary) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1.1 multiplier,
  'S' AS loc,
  id,
  title
FROM articles
WHERE MATCH(summary) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1.1 multiplier,
  'S' AS loc,
  id,
  title
FROM articles
WHERE MATCH(summary) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1.1 multiplier,
  'S' AS loc,
  id,
  title
FROM articles
WHERE MATCH(summary) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1.1 multiplier,
  'S' AS loc,
  id,
  title
FROM articles
WHERE MATCH(summary) AGAINST ('employee' IN BOOLEAN MODE)
UNION ALL


SELECT
  3 score,
  1.5 multiplier,
  'C' AS loc,
  id,
  title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"a railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1.5 multiplier,
  'C' AS loc,
  id,
  title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"railway employee"' IN BOOLEAN MODE)
UNION ALL
SELECT
  2 score,
  1.5 multiplier,
  'C' AS loc,
  id,
  title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('"a railway"' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1.5 multiplier,
  'C' AS loc,
  id,
  title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('railway' IN BOOLEAN MODE)
UNION ALL
SELECT
  1 score,
  1.5 multiplier,
  'C' AS loc,
  id,
  title
FROM articleSections
INNER JOIN articles a ON a.id = artId
WHERE MATCH(content) AGAINST ('employee' IN BOOLEAN MODE)

) t
WHERE score > 0
GROUP BY id, title
ORDER BY score DESC, title;
;
4

1 回答 1

0

评论太长了。

显然,您有非常具体的评分需求,无论是自然语言搜索模式还是布尔搜索模式都无法满足。我想知道 MySQL 中是否有一些隐藏的机制可以为您提供与搜索匹配的关键字列表,然后您可以将其用于评分。我一个都不知道。

如果你有一个很大的语料库和相对稀有的词(意味着你要找的词在相对较少的文档中),那么你可以使用布尔模式来减少搜索空间。这样的查询看起来像:

select t.id, sum(terms.score * wherefactor.factor)
from (select t.*
      . . .
      where MATCH(title, summary, content) AGAINST ('railway employee' IN BOOLEAN MODE)
     ) t left outer join
     (SELECT 3 score, 'a railway employee' term UNION ALL
      SELECT 2 score, 'railway employee' term UNION ALL
      SELECT 2 score, 'a railway' term UNION ALL
      SELECT 1 score, 'employee' term UNION ALL
      SELECT 1 score, 'railway' term UNION ALL
      SELECT 0 score, 'a' term
    ) terms cross join
    (SELECT 'T' as which, 1.0 as factor UNION ALL
     SELECT 'S', 1.1 UNION ALL
     SELECT 'C', 1.5
    ) wherefactor
    on (case when wherefacctor.which = 'T' then title 
             when wherefactor.which = 'S' then subject
             when wherefactor.which = 'C' then content
        end) like concat('%', term, '%')
group by t.id;

这应该为您提供全文搜索的性能以及评分算法的细节。

如果您有一个已知的词典,另一种可能性是构建一个文档术语表。这样的表将为您关心的每个文档和文档中的每个术语都有一行(这称为“词典”)。使用这样的数据结构,您可以自由地实现您选择的任何评分机制。

于 2013-06-24T13:34:48.677 回答